Introduction
The report is trying to identify whether the house's square footage is a reliable measure of what the selling price should be. In this case, it seeks to determine whether there is a relationship between the square footage and the house's listing price. The linear regression approach will be used for this study. The approach is the most appropriate when two sets of continuous variables are quantitative in nature (Padgett, 2011). When using the approach, I would expect the scatterplot to exhibit a linear pattern. The response variable is the variable that is observed or measured, while the predictor variable describes the changes in the response variable (Pardoe, 2010). The square footage is the predictor variable since it explains the changes in a house's listing price.
Data Collection
The sample data has been obtained through the RAND formula in excel. A random number was assigned to each cell value by the formula, and then the random numbers were sorted from the smallest to the largest. Based on the ranking, the first 50 values were selected. The response variable is the listing price, while the predictor variable is the square footage. The scatterplot is depicted below.
Delegate your assignment to our experts and they will do the rest.
Data Analysis
The linear regression requires two sets of continuous variables, namely the predictor and response variables. The response variable is the outcome variable, while the predictor variable offers information on an associated dependent variable concerning a specific outcome (Pardoe, 2010). The histograms of both variables are shown below,
The sample’s summary statistics are depicted below.
Median square feet | Median listing price | |
Mean | 2011 | $278554 |
Median | 1992 | $265912 |
standard deviation | 369.80 | $94532.12 |
Concerning the shape, both the median square footage and listing price variables have asymmetric single-peaked distributions. Both distributions have a single peak and are asymmetric since they are skewed right. In both histograms, the right tails are longer than those on the left.
Concerning the center, the middle value of the square footage distribution is in the bar with the highest frequency, that is between, 1729 and 2079. The median rather than the mean is used in describing the center since the distribution is skewed (Padgett, 2011). For the listing price variable, the center is also located in the bar with the highest frequency, that is between, $213,444 and $266,695.
Median square feet | Median listing price | |
range | 2046 | $533515 |
max | 3074 | $640155 |
min | 1029 | $106641 |
The range depicts the spread of the distribution. The ranges of both the square footage and listing price variables are large, depicting the potential of big differences between individual values.
The median square footage distribution does not have outliers, while the listing price distribution has outliers. The listing prices starting from $586,804 are outliers since they lie at an abnormal distance from the rest of the other values.
Based on shape, the median listing price and median square footage distributions of the sample exhibit a similar pattern to those of the population since they are skewed to the right and are unimodal. Concerning the center, the median values of the two variables for the entire population are lower than those of the sample. The population ranges are significantly greater than the sample ranges for the two variables based on the spread. The population's median listing price distribution has outliers, while the median square footage distribution lacks outliers. Resultantly, the sample is representative of the nationwide housing market sales.
The Regression Model
The regression model can be developed for the data due to the presence of a linear relationship between the square footage and listing price. The scatterplot direction and the line of best fit indicate a positive relationship between the two distinct variables. The strength of the relationship is moderate since the best fit line does not accurately model the scatterplot. The correlation coefficient is 0.65. It aligns with the scatterplot data interpretation since it is positive, indicating a positive relationship and near to 0.5 than 1, depicting a moderately strong relationship.
Line of Best Fit
The regression equation is y=165.68x-54636. Concerning the slope, an increase in the median square footage by 1 leads to an increase in the median listing price by 165.68. When the square footage is zero, the median listing price is -54636. The r-squared is 0.42, and it indicates that the model does not fit the sample data. For square footage of 1000, the listing price should be $111,044.
Conclusion
Overall, the median square footage can be used in predicting the listing price of a house. There is a moderately strong positive relationship between the square footage and the house listing price. The results I obtained met my expectations. Future research should focus on whether the same positive relationship is exhibited for individual counties, for instance, only the Tulsa county.
What is the relationship between median square footage and listing price in Tulsa county?
References
Padgett, L. (2011). Practical statistical methods: A SAS programming approach . CRC Press.
Pardoe, I. (2010). Applied regression modeling: A business approach . Hoboken, N.J: Wiley-Interscience.