7 Dec 2022

122

Multiple Regression Analysis: Definition, Method, Examples

Format: APA

Academic level: University

Paper type: Assignment

Words: 855

Pages: 3

Downloads: 0

Introduction The city tax accessor needs to know the sale prices for residential homes in a Midwest city. The tax accessor wants to do so based on a number of characteristics of the houses. The problem therefore is that the accessor lacks a model that can be used to accurately estimate the prices of these homes. Based on data collected on other houses in the Midwest city which include the sales price of homes, a model can be designed that can predict the residential home sale prices. The data shows prior home sale transactions that were made during the year 2002. Each record provides information on 12 other variables. The sale price is made to be the dependent variable which is also the response variable while the rest of the variables are made to be the predictor variables. Data As earlier indicated, each record had 12 variables recorded. The response variable in this case is the sales price of the residential homes. The predictor variables that are significant in this case are finished square feet, quality, style, year built, lots size, number of bedrooms and garage size. The total number of data points that have been collected are 522.

Exploratory Analysis 

The descriptive statistics for individual predictor variables are shown on the descriptive statistics table in the supporting analysis table. From the table, the mean and the standard deviation of predictor variables dare shown for the whole dataset. The means for the variables that have been used to develop a model for sales price stepwise selection are 2260 feet for the finished square feet, 3.4713 for the number of bedrooms, 24370 for the Lots Size and 2.183 for the quality. The sales prices have a maximum price of $920,000 and minimum price $84,000. The Pearson correlation coefficient of the sales price with each other predictor variables are shown in the Pearson correlation coefficient in the appendix. From the correlation table, we can see there is a strong positive linear correlation between the finished square feet and sales price. Infact, the correlation between the two is the strongest followed by the relationship with quality variable which has strong negative correlation. The reason for this is that the quality variable is categorical with values from 1 to 3with 1 being high quality and 3, low quality. Variable selection Stepwise method, as mentioned above is used in identifying and select the variables that would be used in the model. Essentially. what stepwise method does is to add or eliminate the variables based in the model based on their significance in determination of the dependent variable: home sale prices. The variables Entered/Removed table is shown in the supporting analysis section and it shows why and how variables are added or removed from the model. From the table below, a number of variables are included in the model after stepwise selection and they include, finished square feet of the house, the number of bedrooms, the year the house was built, lots size and quality. The model has a multiple R value of 0.8772 showing that it has strong positive linear relationship with the dependent variable, house sales price.

It’s time to jumpstart your paper!

Delegate your assignment to our experts and they will do the rest.

Get custom essay

Final Model 

Multiple linear regression was run on the sample and model was built using the ‘proc reg’ in SAS. The F value was determined to be 27.89 and p-value is <.001, indicating that the model is significant. Furthermore, the model R 2 was found to be 0.7694 indicating that the independent variables which are also known as the explanatory variables are able to explain 76.94% of the variation of the dependent variable, sales price.

Answer to the main question 

The model above shows a mathematical model from which an accurate estimate of the price of a residential home in the Midwest city can be determined. The model requires the accessor find out the finished square feet of the house, the year the house was built, the lots size, number of bedrooms in the house, Lots size and the quality. Once these variables are identified, they are keyed in on the model and the approximate sales prize of the house can be determined. Therefore, the model answers the question that the sales price of a house can be determined from several factors. For one, the price is determined by the how recently the house was built as newer houses fetch higher prices than old ones. Moreover, other factors play a key role as well a house with a finished square foot of the house that is the larger the house the pricey it is. The number of bedrooms however defies this notion as people tend to go for houses that have a fewer number of rooms boosting the demand for the houses and consequently the prices.

Supporting Analysis 

Figure 1 Descriptive Statistics 

Figure 2 Correlation table 

Regression

SAS code 

SAS code for converting the excel dataset into SAS code 

/* Source File: me Assignemnt.xlsx */

/* Source Path: /home/u37561728 */

/* Code generated on: 4/10/19, 6:55 AM */

%web_drop_table(WORK.Residentialhomes);

FILENAME REFFILE '/home/u37561728/me Assignemnt.xlsx';

PROC IMPORT DATAFILE=REFFILE

DBMS=XLSX

OUT=WORK.Residentialhomes;

GETNAMES=YES;

SHEET="Residentialhomes";

RUN;

PROC CONTENTS DATA=WORK.Residentialhomes; RUN;

%web_open_table(WORK.Residentialhomes);

SAS code for Descriptive Statistics 

proc means data=WORK.RESIDENTIALHOMES chartype mean std min max n vardef=df;

var 'Sales price'n 'Finished square feet'n 'Number of bedrooms'n

'Number of bathrooms'n 'Air conditioning'n 'Garage size'n Pool 'Year built'n

Style 'Lots Size'n;

run;

proc means data=WORK.RESIDENTIAL chartype mean std min max n vardef=df;

var 'Sales price'n 'Finished square feet'n 'Bedrooms No'n 'Bathrooms No'n

'Air conditioning'n 'Garage size'n Pool 'Year built'n Quality Style

'Lots Size'n 'Adjacent to highway'n;

run;

SAS code for Multiple Linear Regression for the dataset 

proc glmselect data=WORK.RESIDENTIALHOMES

outdesign(addinputvars)=Work.reg_design plots=(criterionpanel);

class 'Finished square feet'n 'Bedrooms No'n 'Bathrooms No'n

'Air conditioning'n 'Garage size'n Pool 'Year built'n Quality Style

'Lots Size'n 'Adjacent to highway'n / param=glm;

model 'Sales price'n='Finished square feet'n 'Bedrooms No'n 'Bathrooms No'n

'Air conditioning'n 'Garage size'n Pool 'Year built'n Quality Style

'Lots Size'n 'Adjacent to highway'n / showpvalues selection=stepwise /*this shows how the variables are added and removed using the stepwise method*/

(slentry=0.05 slstay=0.05 select=sl);

run;

proc reg data=Work.reg_design alpha=0.05 plots(only)=(diagnostics residuals

observedbypredicted);

where 'Finished square feet'n is not missing and 'Bedrooms No'n is not missing

and 'Bathrooms No'n is not missing and 'Air conditioning'n is not missing and

'Garage size'n is not missing and Pool is not missing and 'Year built'n is

not missing and Quality is not missing and Style is not missing and

'Lots Size'n is not missing and 'Adjacent to highway'n is not missing;

ods select DiagnosticsPanel ResidualPlot ObservedByPredicted;

model 'Sales price'n=&_GLSMOD /;

output out=WORK.Reg_stats p=p_;

run;

quit;

proc delete data=Work.reg_design;

run;

Illustration
Cite this page

Select style:

Reference

StudyBounty. (2023, September 14). Multiple Regression Analysis: Definition, Method, Examples.
https://studybounty.com/multiple-regression-analysis-definition-method-examples-assignment

illustration

Related essays

We post free essay examples for college on a regular basis. Stay in the know!

17 Sep 2023
Statistics

Scatter Diagram: How to Create a Scatter Plot in Excel

Trends in statistical data are interpreted using scatter diagrams. A scatter diagram presents each data point in two coordinates. The first point of data representation is done in correlation to the x-axis while the...

Words: 317

Pages: 2

Views: 187

17 Sep 2023
Statistics

Calculating and Reporting Healthcare Statistics

10\. The denominator is usually calculated using the formula: No. of available beds x No. of days 50 bed x 1 day =50 11\. Percentage Occupancy is calculated as: = =86.0% 12\. Percentage Occupancy is calculated...

Words: 133

Pages: 1

Views: 150

17 Sep 2023
Statistics

Survival Rate for COVID-19 Patients: A Comparative Analysis

Null: There is no difference in the survival rate of COVID-19 patients in tropical countries compared to temperate countries. Alternative: There is a difference in the survival rate of COVID-19 patients in tropical...

Words: 255

Pages: 1

Views: 251

17 Sep 2023
Statistics

5 Types of Regression Models You Should Know

Theobald et al. (2019) explore the appropriateness of various types of regression models. Despite the importance of regression in testing hypotheses, the authors were concerned that linear regression is used without...

Words: 543

Pages: 2

Views: 175

17 Sep 2023
Statistics

The Motion Picture Industry - A Comprehensive Overview

The motion picture industry is among some of the best performing industries in the country. Having over fifty major films produced each year with different performances, it is necessary to determine the success of a...

Words: 464

Pages: 2

Views: 86

17 Sep 2023
Statistics

Spearman's Rank Correlation Coefficient (Spearman's Rho)

The Spearman’s rank coefficient, sometimes called Spearman’s rho is widely used in statistics. It is a nonparametric concept used to measure statistical dependence between two variables. It employs the use of a...

Words: 590

Pages: 2

Views: 309

illustration

Running out of time?

Entrust your assignment to proficient writers and receive TOP-quality paper before the deadline is over.

Illustration