In the subsequent comparison we have duly taken three statistical measurements for a vis-à-vis analysis on the GSS data set procured using their official website. The data set compared categorical and continuous variables which are then tabulated using different graphical methods. The below showcased results are three forms of representation that were created using the SPSS statistical suite. In each of the results twenty recorded values were taken into account and then compared across two distinct era – 1996 and 2016. The values are tabulated as samples taken from the original set of recordings at the GSS official website.
1996 |
2016 |
Comparison for Average Hours Worked Variable
According to the above tabulated results it is clearly evident that the difference between 1996 and 2016 isn’t much significant when we take into account the variable HRS1 (number of hours worked per week). This is a continuous variable hence it is represented using a histogram (Braunstein & Kimble, 2000). Also the mean for the data set 1996 is lower than that of 2016 however, percentage population close to the mean is higher in 1996 than in 2016 where the number of hours per week recorded have severe disparity among the sample taken.
Delegate your assignment to our experts and they will do the rest.
Since continuous variables are based on numeric readings histograms are the best graphical representation for the above data set. This is because they enable us to identify the data trends using means values and the highest frequency for any given interval, if closer to the mean, determines a researcher’s null hypothesis.
Comparisons for Highest Degree Variable
1996 |
2016 |
The above analysis of the variable degree, which is a categorical ordinal variable shows key difference in both the data sets. For our analysis we denoted 0 = No_Education/No_Schooled, 1 = High_School, 2 = Bachelors, 3 = Masters, 4 = PHD. Using this latter scale we can clearly see that the number of people with no formal education in 2016 have greatly reduced and we have emergence of PHD candidates in the sample set as opposed to none in the 1996 data set. Categorical variables are based on categories hence the denoting of a number to the name of a degree (Banker & Morey, 1986).
Categorical variables provide us with categories so in essences they can only be showcased as a ratio of a larger data set which has additional and or multiple categorical variable. The pie chart is one of the best and reliable ways in graphically representing a ratio therefore categorical variables are represented using it.
1996 |
2016 |
Number of Children Variable (Continuous Interval)
Conclusively our data set also takes into account the number of children for each individual as a valid statistical metric of comparison. In both 1996 and 2016 the most common feature is 2 children per person however for 1996 more people were inclined to just have 2 children and therefore the population had a greater percentage in the dataset. In 2016 the number of children more evenly spread out. In both cases there was just 1 outliner only.
The continuous interval variable assigns the modal value in a data set which accommodates the most occurring value in the recoded measurements for a given category. The bubble graph is used here as it aligns all the frequency of all the values as per which ever interval it is related to, the bubble graph therefore is the most appropriate graphical representation for the above data set. Furthermore, since the variables are dealing with the number of children the interval is a whole number and very accurate.
References
Braunstein, S. L., & Kimble, H. J. (2000). Dense coding for continuous variables. In Quantum Information with Continuous Variables (pp. 95-103). Springer, Dordrecht.
Banker, R. D., & Morey, R. C. (1986). The use of categorical variables in data envelopment analysis. Management science, 32(12), 1613-1627.
Hinton, P. R., McMurray, I., & Brownlow, C. (2004). SPSS explained. Routledge.