Dummy variables in regression analysis takes values of 1 or 0 which indicates a presence or absence of some of the categorical effect that may be potentially shift the outcome of the dependent variable (Vogt, 2006). Most variables used in dummy regression have mutually exclusive categories indicating that if a particular categorized event happens, the other one does not (Fox, 2015). For example, smoking as a dummy variable can be categorized as smoker (1) and non-smoker (0) – and this tells that, if a respondent is a smoker, then he or she cannot fit in the non-smoking category. One potential study that can be carried out from the GSS14_student _8210 data set is assessing the effect of respondents’ sex, citizenship, and age on their income. Based on these variables, the study question can be stated as follows:
Does sex, citizenship, and age significantly predict personal income?
Description of the study variables
Age, sex, and citizenship are independent variables in this case and are hypothesized to influence personal income. Therefore, the income is the dependent variable and is measured in a ratio scale. Further, age is a measured in ratio scale while sex and citizenship are dummy variables measured in a categorical scale (Warner, 2008). Notably, sex is coded as 1 (male) and 0 (otherwise) while citizenship is coded as 1 (US Citizenship) and 0 (otherwise).
Delegate your assignment to our experts and they will do the rest.
Findings
The model was statistically significant at 5% significance level, F (3, 207) = 11.197, p < 0.05. This tells that the model’s R is statistically significant in that age, sex, and citizenship correlate with the shifts in the personal income. The R-squared is 0.140 indicates that 14% of the model variation or changes in personal income can be explained by the changes in the included predictors (Warner, 2008).
Table 1
Regression Model
Table 2
ANOVA Summary for the Model
The model constant was 44,841.428 and this indicates the average amount of income that is not reliant on personal age, sex, and citizenship. Therefore, on average the respondents’ personal earnings were $44,841.43 irrespective of their age, sex, and citizenship. On the other hand, citizenship and sex were negatively correlated with personal income while age was positively correlated with the respondent’s income. Notably, other things being equal, a US citizen earns $11,275.29 less than the other non-US citizens considered in the study (indicated by a negative sign on the coefficient) (t = -3.275, p < 0.05). Similarly, males earn $12,723.43 less than females (considered as otherwise in the study) (t = -2.998, p < 0.05). Lastly, the age had a coefficient of 536.930 and this indicates that a year older for the respondents increased their income by $536.93, ceteris paribus (t = 3.191, p < 0.05). The positive sign associated with the beta for age indicates a positive correlation (Warner, 2008).
Table 3
Model Coefficients
With this in mind, the model can be written as follows:
Income = 44,841.428 – 11,275.29*Citizenship – 12,723.43*Sex + 536.93*Age
Based on the above model, for a female, US citizen aged 25 years would have an average of
Income = 44,841.428 – 11,275.29*(1) – 12,723.43*(0) + 536.93*(25)
= 44,841.428 – 11,275.29 + 13,423.25
= $46,989.39
Model Diagnostics
Multiple regression model makes four basic assumptions. First, it assumes that the predictors and the dependent variable have a linear association and this can be determined using the correlation coefficients (Lewis-Beck & Lewis-Beck, 2015). It is clear that the correlation coefficients between income and the three predictors show significant correlations which prove the presence of linear association assumption was met. Second, the model assumes that the predictor variables are highly correlated and this can be shown in the correlation matrix and the basic rule of thumb is that no pair of the independent variables should have a correlation of above 0.80 (Lewis-Beck & Lewis-Beck, 2015). This assumption was met since all the independent variable pairs had correlations weak correlations less than 0.50. Lastly, the variance of error terms assumption (homoscedasticity) was met too – all the variances were statistically the same - and this tells that no remedy is required as all the assumptions were met.
References
Fox, J. (2015). Applied regression analysis and generalized linear models . Sage Publications.
Lewis-Beck, C., & Lewis-Beck, M. (2015). Applied regression: An introduction (Vol. 22). Sage publications.
Warner, R. M. (2008). Applied statistics: From bivariate through multivariate techniques . Sage.
Vogt, W. P. (2006). Quantitative research methods for professionals in education and other fields. Columbus, OH: Allyn & Bacon .