Question 1
The purpose of the study is to predict the cost of the leases measured in dollars per square foot, based on the Square Feet of the property, Age of the property, number of bathrooms of the property, number of days on the property has been in the market, age of the property, the Number of floors in the building, Square feet in the whole building, Location of the property (Suburbs or City) and the Distance to “old” (historic) downtown in miles. In this context, the variable of interests is the cost of the leases measured in dollars per square foot per year. The cost of lease is computed using two basic methods; that is, the Gross and Net. The Gross also referred as the full-service leases are all-inclusive while the net leases are charged less for the space but do not pay much or any of the extra costs, such as property taxes, insurance, and maintenance. There are three types of net leases labeled as Single (N), Double (NN), or Triple (NNN) Net.
A study population is the set of all objects having the same characteristics selected for a research study. The study population is the commercial real estates that are leased. It comprises of all the commercial real estates from Midwestern city in the United States. A sample of 269 commercial real estates were sampled. A simple random sampling technique was employed in this study, where each of the commercial real estates has an equal opportunity of being sampled. The commercial real estates were selected randomly to minimize in bias.
Delegate your assignment to our experts and they will do the rest.
The parameter is a descriptive measure that is computed from the entire population. In this case, the cost of lease is the mean cost for the population of all the commercial real estates. It is denoted by μ . The statistic is a descriptive measure computed from the sample data. The mean of the sampled cost of the commercial real estates is the statistic and it is denoted as x-bar.
The scientific question is; Is there a difference in the value of the parameter and the statistic? This is an Inferential Statistic, because it enables us to make an educated guess about the study population parameter based on the statistic. The statistic is computed from a sample that is randomly selected from a population. The statistic is used to generalize the whole population.
From the dataset for leases, there are two types of data, that is, nominal and quantitative. The nominal variable is the location of the property and it is measured by either suburbs or city. The other variables such as least cost, square feet of the property, age of property, number of floors in the building, square feet in the whole building and the distance to “old” (historic) downtown in miles.
The study has cross-section data because it comprises of many variables. The study gathered data of the commercial real estates based on any variables, with the aim of predicting the cost of lease. Moreover, this is secondary data since it is collected from the commercial real estate agencies, records and websites. Therefore, this data is reliable since it is collected from the right sources and those tasked with the role of dealing and regulating the real estate business.
The following are the descriptive statistics for the lease cost.
Lease.Cost.($) |
|
Mean |
394991 |
Standard Error |
25355.61 |
Median |
236094 |
Mode |
#N/A |
Standard Deviation |
415863 |
Sample Variance |
1.73E+11 |
Kurtosis |
2.769334 |
Skewness |
1.749801 |
Range |
1948657 |
Minimum |
37375 |
Maximum |
1986032 |
Sum |
1.06E+08 |
Count |
269 |
We will also compute the sample statistics for the Square Feet of the property and Age of the property.
Square.Feet.(square.foot) | Age.(year) | ||
Mean |
10470.81784 |
Mean |
31.41635688 |
Standard Error |
551.0759367 |
Standard Error |
1.491878958 |
Median |
7310 |
Median |
25 |
Mode |
#N/A |
Mode |
19 |
Standard Deviation |
9038.317381 |
Standard Deviation |
24.4686342 |
Sample Variance |
81691181.08 |
Sample Variance |
598.7140598 |
Kurtosis |
3.422713785 |
Kurtosis |
2.91768326 |
Skewness |
1.873733414 |
Skewness |
1.52864527 |
Range |
46012 |
Range |
132 |
Minimum |
1609 |
Minimum |
0 |
Maximum |
47621 |
Maximum |
132 |
Sum |
2816650 |
Sum |
8451 |
Count |
269 |
Count |
269 |
Graphs and charts
The study will employ the histograms for each of the quantitative data.
Histogram for the Square Feet of the property
The histogram above shows the Square Feet of the property. It indicates that it is skewed to the left.
Histogram for the Age of the property
The histogram above shows the age of the property. It indicates that it is skewed to the left. This shows that the age of the properties is lower compared to the median age.
Bar chart for Location of the property.
The bar chart indicates that most of the commercial real estates are found in the city compared to the suburbs.
The variable of interest I the cost of least. The histogram below shows the distribution of the cost of lease for the commercial real estate properties.
Scatter plot
The scatterplot is sed to establish the relationship linking two variables. In this context, we examine the relationship between the cost of lease and the age of the properties. The dependent variable is the lease cost and the independent variable is the square feet of the property.
The graph indicates that there is a positive linear association linking the square feet of the property and the least cost in dollars per square foot per year.
The study will consider to use the normal distribution model for the data.
Question set 2
When constructing the confidence interval, assumptions of the central limit theorem are necessary so that we use the normal model. The two assumption of the confidence interval are: first, randomization where the data should be randomly selected from a population. Second, independence assumption where the sampled data should be independent from each other. When computing the confidence interval for the lease cost, we will use the z-interval simply because the number of observations is more than 30 and we know the population's standard deviation. We use the table below to construct the confidence interval.
Z-value (94%) |
1.695398 |
Z-value (91%) |
1.880794 |
Standard deviation |
415863 |
Sample size |
269 |
Mean |
394991.01 5 |
Marginal error = Z*
For 91%, the marginal error is
M.E = Z*
= 47688.675
For 94%, the marginal error is
M.E = Z*
= 42987.849
The confidence interval for 91% is;
CI= Mean ± marginal error
= 394991.015 ± 47688.675
(347302.34, 442679.69)
The confidence interval for 94% is;
CI= Mean ± marginal error
= 394991.015 ± 42987.849
(352003.166, 437978.864)
An average lease cost of $442883 falls withing the interval of 94% but not in the 91% confidence interval. This is because the 94% has low precision with high accuracy compared to the 91% confidence interval.
Hypothesis test for a population mean
To test this hypothesis the following assumption should be take the following consideration. The data should be normally distributed. We formulate the following hypothesis.
H0: μ = 355443
H1: μ > 355443
The alpha =0.06
Z-statistic =
= = 1.56
From the normal tables, the critical value based on alpha=0.06 is 1.55. Critical value = 1.55
Since it is observed that t = 1.56 > t c =1.55, it is then concluded that the null hypothesis is rejected. Therefore, there is enough evidence to claim that the population mean μ is greater than 355443, at the 0.06 significance level.
The procedure of hypothesis testing required assumptions. The two assumption of the confidence interval are: first, randomization, where the data should be randomly selected from a population. Second, independence assumption, where the sampled data should be independent from each other.
Hypothesis test for two groups
We will test the hypothesis that the mean Lease Cost $ of the two groups identified by your qualitative variable are different. We formulate the hypothesis as follows;
H0: μ1 -μ2 = 0
H1: μ1 -μ2 ≠ 0
The alpha = 0.04
t-Test: Two-Sample Assuming Unequal Variances | ||
Lease.Cost.($) |
Square.Feet.(square.foot) |
|
Mean |
395194.3 |
10429.04 |
Variance |
1.74E+11 |
81525980 |
Observations |
268 |
268 |
Hypothesized Mean Difference |
0 |
|
Df |
267 |
|
t Stat |
15.11517 |
|
P(T<=t) one-tail |
5.11E-38 |
|
t Critical one-tail |
1.757375 |
|
P(T<=t) two-tail |
1.02E-37 |
|
t Critical two-tail |
2.063831 |
The test statistic = 15.115517, p = 1.02E-37 , critical value = 2.063831. Since the t= 15.115 > t c = 2.0638, we reject the null hypothesis. Hence, we deduce that there is enough evidence to claim that the mean Lease Cost $ of the two groups are significantly different.
We will compute the confidence interval for the average difference in lease cost between the two populations.
Pooled Variance
s2p = ((df1)(s21) + (df2)(s22)) / (df1 + df2) = 46540902256548.8 / 536
= 86830041523.41
Standard Error
s(M1 - M2) = √((s2p/n1) + (s2p/n2)) = √((86830041523.41/269) + (86830041523.41/269))
= 25408.2
Confidence Interval
μ1 - μ2 = (M1 - M2) ± ts(M1 - M2) = 384765.26 ± (1.96 * 25408.2)
= 384765.26 ± 49911.8563
= 95% CI [334853.4037, 434677.1163].
Hence, we are 95% confident that the difference between your two population means (μ1 - μ2) lies between 334853.4037 and 434677.1163 units.
Question 3 Simple Linear Regression Model
The quantitative variables include the least cost, Square Feet of the property, age of property, number of floors in the building, square feet in the whole building and the Distance to downtown in miles.
Scatterplots
The scatterplot between the Lease cost and the Square Feet of the property.
The correlation between the Lease cost and the Square Feet of the property is 0.1549. This implies that there is a positive association linking the Lease cost and the Square Feet of the property.
The scatterplot between the Lease cost and Age of the property.
The correlation between the Lease cost and the Square Feet of the property is -0.1128. Implying that there is a negative association linking the Lease cost and the age of the property.
The scatterplot between the Lease cost and Age of the property.
The correlation between the Lease cost and the Square Feet of the property is 0.06293. The positive association implies that an increase in the number of floors in the building will result to a decrease in the lease cost.
Scatterplot between the Lease cost and Square feet in the whole building.
The correlation between the Lease cost and the Square feet in the whole building is -0.00417. Implying that an increase in the Square feet in the whole building will result to a decrease in the lease cost.
Scatterplot between the Lease cost and Distance to Downtown (in miles)
The correlation between the Lease cost and the Distance to Downtown is -0.0353. Implying that there is a negative association between the Lease cost and Distance to Downtown in miles.
The pair with the strongest association is Lease cost and the Square Feet of the property, which is 0.1549. This is because the value of association is higher compared to other pair of associations. An increase in the Square Feet of the property will result to an increase in the lease cost.
The general formulae for the simple linear regression model between Y and X are; Y = βo + β1 X
The linear relationship between Lease Cost $ (Y) and the Square Feet of the property is positive
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
|
Intercept |
320331.9 |
38448.05 |
8.331552 |
4.23E-15 |
244632 |
396031.9 |
Square.Feet.(square.foot) |
7.130204 |
2.781818 |
2.563146 |
0.010921 |
1.653115 |
12.60729 |
Y = 320331.9 + 7.1302 Square.Feet.
The slope of the y-intercept is 7.1302. This implies that a unit increase in the Square Feet of the property will raise the Lease Cost by $7.1302.
The value of R-squared is 0.02035. This implies that 2.035 % of the variations of the lease cost is explained by the square Feet of the property.
To predict, we use the regression equation, Y = 320331.9 + 7.1302 square feet, given that the Square Feet of the property is 16900, then the predicted lease cost will be $440832.28.
A 98% confidence interval for the average Lease Cost $ (Y) with 16900 is (486541.692, 395122.868). Hence, we are 98 % confident that the predicted value will range between $ 486541.692 and $ 395122.868.
To establish if the linear relationship is significant, we test the following hypothesis.
H0: No linear relationship
H1: a significant linear relationship
Use alpha = 0.03
ANOVA | ||||||
df |
SS |
MS |
F |
Significance F |
||
Regression |
1 |
1.11E+12 |
1.11E+12 |
6.569716 |
0.010921 |
|
Residual |
267 |
4.52E+13 |
1.69E+11 |
|||
Total |
268 |
4.63E+13 |
The F (1,268) = 6.5697, p = 0.0109. Since the p-value is smaller compared to alpha = 0.03, we reject the H0. We deduce that there is a significant linear relationship between X and Y.
The 99 % confident interval for the slope is (-0.0862, 14.347).
Assumptions
Linearity
The scatter plot is used to establish if there is a linear relationship between the lease cost and the square Feet of the property. The plot indicates that there is a linear relationship.
Residual plots
The pattern of the residual plot indicates that there is homogeneity.
The normal plot is used to establish if the residuals follow a normal distribution.
The normal probability plot above indicates that the residual follows a normal distribution.
Therefore, the results from the regression analysis are reliable.
Question set 4: Multiple regression
Model 1
SUMMARY OUTPUT | ||||||||
Regression Statistics |
||||||||
Multiple R |
0.23 |
|||||||
R Square |
0.05 |
|||||||
Adjusted R Square |
0.03 |
|||||||
Standard Error |
409119.74 |
|||||||
Observations |
269.00 |
|||||||
ANOVA | ||||||||
Df |
SS |
MS |
F |
Significance F |
||||
Regression |
6.00 |
2495172721056.80 |
415862120176.13 |
2.48 |
0.02 |
|||
Residual |
262.00 |
43853288314541.10 |
167378963032.60 |
|||||
Total |
268.00 |
46348461035597.90 |
||||||
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
Lower 94.0% |
Upper 94.0% |
|
Intercept |
538096.26 |
111465.37 |
4.83 |
0.00 |
318614.28 |
757578.23 |
327541.33 |
748651.18 |
Square.Feet.(square.foot) |
8.71 |
3.07 |
2.84 |
0.00 |
2.67 |
14.75 |
2.91 |
14.50 |
Age.(year) |
-2802.00 |
1234.92 |
-2.27 |
0.02 |
-5233.63 |
-370.36 |
-5134.72 |
-469.27 |
Distance.to.Downtown.(miles) |
-17097.50 |
19396.97 |
-0.88 |
0.38 |
-55291.29 |
21096.29 |
-53737.83 |
19542.83 |
Num.Floors.in.Bldg |
1401.42 |
2233.63 |
0.63 |
0.53 |
-2996.73 |
5799.58 |
-2817.84 |
5620.69 |
Sq.Ft.in.Bldg.(square.foot) |
-3.59 |
2.99 |
-1.20 |
0.23 |
-9.48 |
2.30 |
-9.24 |
2.06 |
Location |
-24566.21 |
89790.97 |
-0.27 |
0.78 |
-201369.99 |
152237.56 |
-194178.80 |
145046.38 |
The percentage of total variation in the Lease Cost (Y) can be explained by model is 3 %. The value of the adjusted multiple coefficient of determination is 0.03.
To establish if the overall model is significant, we test the following hypothesis.
H0: The model 1 not significant
H1: Model 1 is significant.
Alpha = 0.06
The F (6,268) = 2.48, p = 0.02. The p-value is small compared to the alpha = 0.06, hence we reject the null hypothesis. We deduce that the overall model is significant.
Further, using alpha = 0.06, the predictor variables that are significant are square feet and age of the property.
Model 2
In this model, we will remove the location of the property, because the p-value is greater compared to other predictor p-values.
SUMMARY OUTPUT | ||||||||
Regression Statistics |
||||||||
Multiple R |
0.23 |
|||||||
R Square |
0.05 |
|||||||
Adjusted R Square |
0.04 |
|||||||
Standard Error |
408399.53 |
|||||||
Observations |
269.00 |
|||||||
ANOVA | ||||||||
df |
SS |
MS |
F |
Significance F |
||||
Regression |
5.00 |
2482643849707.31 |
496528769941.46 |
2.98 |
0.01 |
|||
Residual |
263.00 |
43865817185890.60 |
166790179414.03 |
|||||
Total |
268.00 |
46348461035597.90 |
||||||
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
Lower 94.0% |
Upper 94.0% |
|
Intercept |
545147.35 |
108254.13 |
5.04 |
0.00 |
331992.26 |
758302.43 |
340661.75 |
749632.95 |
Square.Feet.(square.foot) |
8.69 |
3.06 |
2.84 |
0.00 |
2.66 |
14.71 |
2.90 |
14.47 |
Age.(year) |
-2727.05 |
1202.04 |
-2.27 |
0.02 |
-5093.89 |
-360.21 |
-4997.62 |
-456.48 |
Distance.to.Downtown.(miles) |
-19886.75 |
16472.68 |
-1.21 |
0.23 |
-52321.87 |
12548.37 |
-51002.66 |
11229.16 |
Num.Floors.in.Bldg |
1491.12 |
2205.55 |
0.68 |
0.50 |
-2851.66 |
5833.90 |
-2675.03 |
5657.27 |
Sq.Ft.in.Bldg.(square.foot) |
-4.06 |
2.44 |
-1.66 |
0.10 |
-8.87 |
0.75 |
-8.68 |
0.55 |
The percentage of total variation in the Lease Cost (Y) can be explained by Model 2 is 4%. The value is higher compared to that of model 1. With the value of adjusted multiple coefficient of determination being 0.04, which is higher than model 1.
Hypothesis test
We establish if the overall model 2 is significant, we test the following hypothesis.
H0: The model 2 not significant
H1: Model 2 is significant.
Alpha = 0.08
The F (6,268) = 2.98, p = 0.01. The p-value is small compared to the alpha = 0.08, hence we reject the null hypothesis. We deduce that the overall model 2 is significant.
The significant independent variables for model 2 at alpha = 0.08 are square feet and age of the property.
Since the model 2 is the best, multiple regression equation is;
Lease Cost = 545147.34 +8.69 Square Feet (square.foot) -2727.05 age -19886.75 Distance.to.Downtown + 1491.12 Num.Floors.in.Bldg -4.06 Sq.Ft.in.Bldg.(square.foot).
Least Cost = 545147.34 +8.69 *38875 -2727.05 *49 -19886.75 *4.94 + 1491.12 *72 -4.06 *41533.18
= $ 589841.02
The coefficient for Square Feet of the property is 8.69, implies that an increase a unit Square Feet of the property will increase the Lease cost by $8.69.
Assumptions
Linearity
There is a linear relationship between the lease cost with age and square feet of property.
The scatter plot is used to establish if there is a linear relationship between the lease cost and the square Feet of the property. The plot indicates tat there is a linear relationship.
Residual plots
The residual plot for all the independent variables above exhibits same patterns. Hence, the variations of the dependent variable are the same across all values of the independent variables.
The normal plot is used to establish if the residuals follows a normal distribution.
The normal probability plot above indicates that the residual follows a normal distribution. Therefore, the results from the regression analysis are reliable.