21 Nov 2022

272

How to Analyze Data Using Descriptive Statistics

Format: APA

Academic level: College

Paper type: Statistics Report

Words: 1727

Pages: 7

Downloads: 0

Introduction 

This project’s key purpose is to determine the student’s mastery of descriptive statistics. It involves analysis of a dataset of choice by use of the technology tool STATKEY. The study performs descriptive statistics and graphical analysis of one quantitative variable, one categorical variable, one quantitative and one categorical variable, two categorical variables, and two quantitative variables. The dataset chosen for this project is the “Nutrition Study” data file. This dataset contains three hundred and fifteen cases as well as seventeen variables. The variables include case ID, age, smoke, Quetelet, vitamin, calories, fat, fiber, alcohol, cholesterol, beta diet, retinol diet, beta plasma, retinol plasma, gender, vitamin use, and prior smoke. The table below provides a comprehensive analysis of the variables found in the dataset 

Variable Name  Variable Description  Variable Type 
ID 

It numbers the cases; 

assigns each case a number 

Quantitative 
Age 

Represent ages of the subjects of nutrition study; 

The ages, in this case, are ranging from 19-83 

Quantitative 
Smoke  It has values of yes and no  Categorical (level2) 
Quetelet  Represents Adolphe Quetelet’s indices  Quantitative 
Vitamin 

Represents vitamin intake by each subject of the 

Study. 

Quantitative 
Calories  Represents the subject’s level of calories  Quantitative 
Fat  Individual calories intake per diet  Quantitative 
Fiber  Represents the level of fiber intake  Quantitative 
Alcohol  Represent the subject’s level of alcohol intake  Quantitative 
Cholesterol 

The variable represents the level of cholesterol intake 

By the subject 

Quantitative 
Beta Diet  Type of diet  Quantitative 
Retinol Diet  Type of diet  Quantitative 
Beta Plasma  Type of diet  Quantitative 
Retinol Plasma  Type of diet  Quantitative 
Gender  Gender of the subject under study  Categorical (level 2) 
Vitamin Use  The frequency of vitamin usage  Categorical (level 3) 
Prior Smoke  Smokes prior to the period of study  Quantitative 
It’s time to jumpstart your paper!

Delegate your assignment to our experts and they will do the rest.

Get custom essay

Analysis 

Analysis of One Quantitative Variable 

Analysis of Calories 

The following table shows the summary of descriptive statistics calculated using STATKEY. 

Summary Statistics 

Statistic  Value 
Sample Size 

315 

Mean 

1796.655 

Standard Deviation 

680.347 

Minimum 

445.2 

Q 1 

1338.000 

Median 

1666.800 

Q 3 

2100.450 

Maximum 

6662.2 

The results in the table show that the calories variable in the dataset Nutrition Study has a mean of 1796.655. This implies that the average calories intake for all the subjects under study is 1796.655. The standard deviation measures the mean deviations. It represents how far the data points deviate from the mean ( Dini, 2016 ). The standard deviation value of 680.347 means that in average, the data points for the variable calories deviates from the mean by 680.347 units. The lowest calories intake is 445.2 while the highest calories intake is 6662.2; there is a huge range in the calories variable. 

Dot Plot of Calories (Quantitative Variable) 

Skewness 

The dataset range from 445.2 to 6662.2, but most data is concentrated between 1000 and 3000. As seen in the graph, the calories data points are concentrated towards the left of the graph; this is evidence that the calories variable is skewed to the left. Also, the mean, 1796.655, is greater than the median, 1666.800. The median (middle value of the data set) is towards the left of the mean, implying that most cases are concentrated to the left as compared to the right of the mean. The fact that mean>median is, therefore, an indicator of skewness to the left. 

Outliers 

Outliers in a data score can be determined using z-scores or setting data limit using the minimum and maximum fence ( Dini, 2016 ). In this case, we will use the lower and upper fence method to identify outliers. The lower and upper fences are determined using quartiles. The formula for finding lower and upper fence is given below. 

Upper fence = Q 3 + 1.5IQR 

Lower Fence = Q 1 – 1.5 IQR 

Where Q 1 is the first quartile, Q 3 is the third quartile, and IQR is the inter-quartile range. 

Interquartile Range (IQR) = Third Quartile – First Quartile 

In this case: 

Inter-Quartile Range = Third Quartile – First Quartile 

= 2100.450 – 1338.000 

= 762.450 

Upper Fence = Q 3 + 1.5IQR 

=2100.450 + 1.5 (762.450) 

= 3244.125 

Lower Fence = Q 1 – 1.5 IQR 

= 1338.000 – 1.5(762.450) 

= 194.325 

The calories data should range between 194.325 and 3244.125. A data point less than 194.325 or greater than 3244.125 is considered an outlier. The minimum value is 445.2, thus there is no value less than 194.325 (lower fence). However, there are values greater 3244.125 as shown in the table below. 

Calories Outliers 
ID  Calories Value 
62  6662.2 
75  3457.2 
77  3258.3 
95  3711 
152  4373.6 
212  3328.4 
269  3449.7 
294  3511.1 

The presence of outliers to the right of the third quartile can also be seen in the dot plot above. There are data points that stretch too far from the third quartile. 

Analysis of One Categorical Variable 

Analysis of Smoke 

The smoke variable is a level two categorical variable. It has got two values, yes and no. The following data shows frequency table and relative frequency columns in one table. 

Summary Statistics 

  Count  Proportion 
No  272  0.863 
Yes  43  0.137 
Total  315  1.000 

A large percentage of the subjects under the nutrition study are non-smokers. Those who responded YES under smoke variable were 43 out of 315 cases which represent 13.7% of the total cases. Non-smokers were the remaining 86.3% of the cases. Out of the ten cases studied, 9 are non-smokers. The summary statistics reveal that the study was conducted mostly on non-smokers. The figure below represents a graphical analysis of the number of smokers and non-smokers. The YES respondents are represented by the small bar while the NO respondents are represented by the bigger graphs. As can be seen in the bar graph, such graphical representations are the best means of representing categorical variables. They clearly represented the number of levels and their frequencies.  

Categorical Variable (Smoke) 

Analysis of One Relationship Between Two Categorical Variables 

Relationship between Gender and Smoke 

Summary Statistics 

Smoke \ Gender  Female  Male  Total 
No  237  35  272 
Yes  36  43 
Total  273  42  315 
Smoke \ Gender  Female  Male  Total 
No  0.752  0.111  0.863 
Yes  0.114  0.022  0.137 
Total  0.867  0.133 

The tables above represent the summary statistic of the analysis of one relationship between two categorical variables (Gender and Smoke). The statistics reveal that out of the 273 female subjects in the study, 237 were non-smokers. On the other hand, 35 out of 42 of men were non-smokers. 7 out of 42 (16.7%) of male subjects are smokers, while 36 out of 273 (13.18%) of female subjects are smokers. This indicates the percentage of smokers in the male is larger as compared to that of females. 86.7% of the cases under study were females indicating that the study had a gender bias; it was mostly conducted on female non-smokers. There is some sort of association between gender and smoke. Percentage of male smokers is higher than that of female non-smokers. However, we cannot make conclusions from this study due to the biases in gender. It will only be fair to come to this conclusion if the number of male and female subjects studied were equal. 

Analysis of One Relationship Between Categorical Variable and Quantitative Variable 

Analysis of Relationship between Cholesterol and Vitamin Use 

Summary Statistics 

Statistics  Regular  Occasional  No  Overall 
Sample Size 

122 

82 

111 

315 

Mean 

236.691 

245.443 

246.599 

242.461 

Standard Deviation 

151.098 

99.628 

131.330 

131.992 

Minimum 

59.2 

84 

37.7 

37.7 

Q 1 

141.10 

171.20 

154.85 

155.00 

Median 

194.20 

227.65 

211.70 

206.30 

Q 3 

283.30 

308.80 

333.40 

308.85 

Maximum 

900.7 

574.2 

718.8 

900.7 

The summary statistics table above indicates the relationship between the categorical variable (vitamin use) and quantitative variable (cholesterol). The mean cholesterol for regular vitamin users is 236.691; the mean cholesterol for occasional vitamin users is 245.4443, while that of non-vitamin users is 246.599. The overall mean of the cholesterol is 242.461. Regular vitamin users have the lowest cholesterol mean, followed by occasional vitamin users, then the non-vitamin users have the highest cholesterol levels. This shows that there is a relationship between the categorical variable (vitamin use) and quantitative variable (cholesterol). 

An increase in the frequency of vitamin intake is associated with a decrease in cholesterol amounts. We can conclude that there is a negative correlation between vitamin use and cholesterol. This relationship can also be established by considering the overall mean of the cholesterol. The mean cholesterol for regular vitamin use is lower than the overall cholesterol average. Similarly, the mean cholesterol for no vitamin use is higher than the overall cholesterol mean.  

Relationship between Cholesterol and Vitamin Use 

The graph above indicates the relationship between the quantitative variable (cholesterol) and categorical variable vitamin use. All the data for no vitamin use, regular vitamin use, and occasional vitamin use are skewed to the left. The graph shows the possibility of a high number of outliers in regular vitamin use towards the right. 

Analysis of Relationship Between Two Quantitative Variables 

Analysis of Relationship between Beta Diet and Cholesterol Intake 

A scatter plot is a graphical representation of the correlation between two variables. In the above scatter plot of beta diet against cholesterol, the plots are scattered everywhere hence there is no strong relationship between the two variables. There is no clear trend in the plots. The plots, however, are scattered towards the positive direction which shows there is a slight positive relationship between the two variables. The two variables lack a strong linear correlation; there is no association between the two variables. The linear relationship is weak that we cannot rely on it. 

The weak positive relationship between the beta diet and cholesterol is also seen in the value of the correlation. The results of computation by the STATKEY program show that the correlation coefficient between beta diet and cholesterol is 0.116. This value is close to zero thus representing a weak relationship between the two variables. We cannot establish a strong relationship between the two variables; an increase in the cholesterol variable is associated with a slight increase in the beta diet variable. From this value of correlation coefficient and the trends in the scatter plot we can conclude that there is no relationship between the cholesterol variable and beta diet variable. 

I expected this kind of relationship between bet diet and cholesterol. Picking cholesterol as the independent variable and beta diet as a dependent variable means a relationship will only exist if a change in the cholesterol is associated with a change in beta diet. However, the two do not have a relationship hence this kind of scatter plot was expected. 

Conclusion 

As a student, this project has helped me build a strong knowledge in the descriptive analysis using STATKEY. Statistical analysis using STATKEY is so easy as compared to excel since it is easier to obtain summary statistics and graphical analysis without performing complex calculations. For instance, in this project, we managed to analyze nutrition study file using STATKEY. The project shows how descriptive statistical analysis of both categorical and quantitative variables is possible in STATKEY program. For quantitative variables, it is possible to calculate descriptive statistics such as mean, standard deviation, median, quartiles, minimum, and maximum values. Also, proportions and relative frequencies can be determined in the case of categorical variables. In this project, I was able to analyze the relationship between two categorical variables, two quantitative variables, and a quantitative and categorical variable. The use of summary statics and graphs in the analysis using STATKEY provides a comprehensive statistical overview of each variable and their relationship with other variables. 

Nutrition study analysis is important to project in our daily lives. The study shows how different lifestyles affect our lives. It is therefore important to show the relationship between the nutrition variables and lifestyle diseases to make such studies helpful. Therefore, I think the variable “lifestyle diseases” should be gathered next time to increase the depth of this analysis. Gathering this variable will make it possible to establish the relationship between various nutrition lifestyle and lifestyle diseases. Obtaining scatter plot and correlation coefficients for such relationship will show if a certain nutrition lifestyle help increases or decrease the risk of lifestyle diseases. Also, the above dataset is gender biased; a large portion of subjects under study are females. Nutrition study is a sensitive topic which cut across all genders. It is therefore important to gather more information on the gender variable such that it has an equal number of females and males. 

References

Dini, L. (2016). EDTC 810: Statistics for Educational Research Dr. Glazer May 6, 2016. 

Tintle, N., Chance, B. L., Cobb, G. W., Rossman, A. J., Roy, S., Swanson, T., & VanderStoep, J. (2015).  Introduction to Statistical Investigations: High School Binding . John Wiley. 

Illustration
Cite this page

Select style:

Reference

StudyBounty. (2023, September 17). How to Analyze Data Using Descriptive Statistics.
https://studybounty.com/how-to-analyze-data-using-descriptive-statistics-statistics-report

illustration

Related essays

We post free essay examples for college on a regular basis. Stay in the know!

17 Sep 2023
Statistics

Scatter Diagram: How to Create a Scatter Plot in Excel

Trends in statistical data are interpreted using scatter diagrams. A scatter diagram presents each data point in two coordinates. The first point of data representation is done in correlation to the x-axis while the...

Words: 317

Pages: 2

Views: 187

17 Sep 2023
Statistics

Calculating and Reporting Healthcare Statistics

10\. The denominator is usually calculated using the formula: No. of available beds x No. of days 50 bed x 1 day =50 11\. Percentage Occupancy is calculated as: = =86.0% 12\. Percentage Occupancy is calculated...

Words: 133

Pages: 1

Views: 150

17 Sep 2023
Statistics

Survival Rate for COVID-19 Patients: A Comparative Analysis

Null: There is no difference in the survival rate of COVID-19 patients in tropical countries compared to temperate countries. Alternative: There is a difference in the survival rate of COVID-19 patients in tropical...

Words: 255

Pages: 1

Views: 251

17 Sep 2023
Statistics

5 Types of Regression Models You Should Know

Theobald et al. (2019) explore the appropriateness of various types of regression models. Despite the importance of regression in testing hypotheses, the authors were concerned that linear regression is used without...

Words: 543

Pages: 2

Views: 175

17 Sep 2023
Statistics

The Motion Picture Industry - A Comprehensive Overview

The motion picture industry is among some of the best performing industries in the country. Having over fifty major films produced each year with different performances, it is necessary to determine the success of a...

Words: 464

Pages: 2

Views: 86

17 Sep 2023
Statistics

Spearman's Rank Correlation Coefficient (Spearman's Rho)

The Spearman’s rank coefficient, sometimes called Spearman’s rho is widely used in statistics. It is a nonparametric concept used to measure statistical dependence between two variables. It employs the use of a...

Words: 590

Pages: 2

Views: 309

illustration

Running out of time?

Entrust your assignment to proficient writers and receive TOP-quality paper before the deadline is over.

Illustration