Introduction The city tax accessor needs to know the sale prices for residential homes in a Midwest city. The tax accessor wants to do so based on a number of characteristics of the houses. The problem therefore is that the accessor lacks a model that can be used to accurately estimate the prices of these homes. Based on data collected on other houses in the Midwest city which include the sales price of homes, a model can be designed that can predict the residential home sale prices. The data shows prior home sale transactions that were made during the year 2002. Each record provides information on 12 other variables. The sale price is made to be the dependent variable which is also the response variable while the rest of the variables are made to be the predictor variables. Data As earlier indicated, each record had 12 variables recorded. The response variable in this case is the sales price of the residential homes. The predictor variables that are significant in this case are finished square feet, quality, style, year built, lots size, number of bedrooms and garage size. The total number of data points that have been collected are 522.
Exploratory Analysis
The descriptive statistics for individual predictor variables are shown on the descriptive statistics table in the supporting analysis table. From the table, the mean and the standard deviation of predictor variables dare shown for the whole dataset. The means for the variables that have been used to develop a model for sales price stepwise selection are 2260 feet for the finished square feet, 3.4713 for the number of bedrooms, 24370 for the Lots Size and 2.183 for the quality. The sales prices have a maximum price of $920,000 and minimum price $84,000. The Pearson correlation coefficient of the sales price with each other predictor variables are shown in the Pearson correlation coefficient in the appendix. From the correlation table, we can see there is a strong positive linear correlation between the finished square feet and sales price. Infact, the correlation between the two is the strongest followed by the relationship with quality variable which has strong negative correlation. The reason for this is that the quality variable is categorical with values from 1 to 3with 1 being high quality and 3, low quality. Variable selection Stepwise method, as mentioned above is used in identifying and select the variables that would be used in the model. Essentially. what stepwise method does is to add or eliminate the variables based in the model based on their significance in determination of the dependent variable: home sale prices. The variables Entered/Removed table is shown in the supporting analysis section and it shows why and how variables are added or removed from the model. From the table below, a number of variables are included in the model after stepwise selection and they include, finished square feet of the house, the number of bedrooms, the year the house was built, lots size and quality. The model has a multiple R value of 0.8772 showing that it has strong positive linear relationship with the dependent variable, house sales price.
Delegate your assignment to our experts and they will do the rest.
Final Model
Multiple linear regression was run on the sample and model was built using the ‘proc reg’ in SAS. The F value was determined to be 27.89 and p-value is <.001, indicating that the model is significant. Furthermore, the model R 2 was found to be 0.7694 indicating that the independent variables which are also known as the explanatory variables are able to explain 76.94% of the variation of the dependent variable, sales price.
Answer to the main question
The model above shows a mathematical model from which an accurate estimate of the price of a residential home in the Midwest city can be determined. The model requires the accessor find out the finished square feet of the house, the year the house was built, the lots size, number of bedrooms in the house, Lots size and the quality. Once these variables are identified, they are keyed in on the model and the approximate sales prize of the house can be determined. Therefore, the model answers the question that the sales price of a house can be determined from several factors. For one, the price is determined by the how recently the house was built as newer houses fetch higher prices than old ones. Moreover, other factors play a key role as well a house with a finished square foot of the house that is the larger the house the pricey it is. The number of bedrooms however defies this notion as people tend to go for houses that have a fewer number of rooms boosting the demand for the houses and consequently the prices.
Supporting Analysis
Figure 1 Descriptive Statistics
Figure 2 Correlation table
Regression
SAS code
SAS code for converting the excel dataset into SAS code
/* Source File: me Assignemnt.xlsx */
/* Source Path: /home/u37561728 */
/* Code generated on: 4/10/19, 6:55 AM */
%web_drop_table(WORK.Residentialhomes);
FILENAME REFFILE '/home/u37561728/me Assignemnt.xlsx';
PROC IMPORT DATAFILE=REFFILE
DBMS=XLSX
OUT=WORK.Residentialhomes;
GETNAMES=YES;
SHEET="Residentialhomes";
RUN;
PROC CONTENTS DATA=WORK.Residentialhomes; RUN;
%web_open_table(WORK.Residentialhomes);
SAS code for Descriptive Statistics
proc means data=WORK.RESIDENTIALHOMES chartype mean std min max n vardef=df;
var 'Sales price'n 'Finished square feet'n 'Number of bedrooms'n
'Number of bathrooms'n 'Air conditioning'n 'Garage size'n Pool 'Year built'n
Style 'Lots Size'n;
run;
proc means data=WORK.RESIDENTIAL chartype mean std min max n vardef=df;
var 'Sales price'n 'Finished square feet'n 'Bedrooms No'n 'Bathrooms No'n
'Air conditioning'n 'Garage size'n Pool 'Year built'n Quality Style
'Lots Size'n 'Adjacent to highway'n;
run;
SAS code for Multiple Linear Regression for the dataset
proc glmselect data=WORK.RESIDENTIALHOMES
outdesign(addinputvars)=Work.reg_design plots=(criterionpanel);
class 'Finished square feet'n 'Bedrooms No'n 'Bathrooms No'n
'Air conditioning'n 'Garage size'n Pool 'Year built'n Quality Style
'Lots Size'n 'Adjacent to highway'n / param=glm;
model 'Sales price'n='Finished square feet'n 'Bedrooms No'n 'Bathrooms No'n
'Air conditioning'n 'Garage size'n Pool 'Year built'n Quality Style
'Lots Size'n 'Adjacent to highway'n / showpvalues selection=stepwise /*this shows how the variables are added and removed using the stepwise method*/
(slentry=0.05 slstay=0.05 select=sl);
run;
proc reg data=Work.reg_design alpha=0.05 plots(only)=(diagnostics residuals
observedbypredicted);
where 'Finished square feet'n is not missing and 'Bedrooms No'n is not missing
and 'Bathrooms No'n is not missing and 'Air conditioning'n is not missing and
'Garage size'n is not missing and Pool is not missing and 'Year built'n is
not missing and Quality is not missing and Style is not missing and
'Lots Size'n is not missing and 'Adjacent to highway'n is not missing;
ods select DiagnosticsPanel ResidualPlot ObservedByPredicted;
model 'Sales price'n=&_GLSMOD /;
output out=WORK.Reg_stats p=p_;
run;
quit;
proc delete data=Work.reg_design;
run;