Abstract
Statistics is well explained in two ways, descriptive statistics and inferential statistics. The concern of the report is how to explain descriptive statistics. Descriptive statistics is classified into four major types, that is, the measures of frequency, the measures of central tendency, the measures of dispersion or variation and the measures of position. The measures of Central tendency are: Mean, Median and Mode; the measures of dispersions are: Range, variance and the standard deviation; the measures of position are: Percentile ranks and quartile ranks. This report focuses on the last three measures of the four mentioned above by using a sample data “Freshman 15” from Statdisk. The report includes outputs from Statdisk that results of two variables, body weight and Body Mass Index (BMI). The outputs are accompanied by charts, boxplots and histograms, for the observation of how the data values behave.
EXPLAINING STATISTICS
Introduction
Explaining statistics to a non-statistician needs an effort. To be able to do this, it all starts by understanding the four levels of measurement: Nominal, ordinal, ratio and interval. Nominal variables are used to describe, examples of a nominal variables are gender and blood type. Ordinal variables have the order of the relationship among the different categories, for example the stage of cancer (I, II, III and IV). Ratio data have equal values between the intervals and a zero is meaningful to this variables, for example, weight of a student. Interval variables have a constant between with a zero being arbitrary. Descriptive statistics are well calculated once the levels of measurement are understood.
Delegate your assignment to our experts and they will do the rest.
Data Levels of Measurement
Data is categorized into two major parts, qualitative data and quantitative data. Quantitative data are measures of values or counts of which are expressed as values or counts whereas qualitative data are measures of types which are usually represented by a symbol. Two quantitative variables chosen is the body weight (measured in Kilograms) and the BMI, Body Mass Index.
The BMI variable in the interval level of measurement because the data values have equal distance, there is constant but the zero point is arbitrary. The body weight variable is in a ratio level of measurement because one can find the ratio between two different weights, for instance, 80Kg is twice 40 Kg.
Descriptive Statistics using Computations
The mean, median and midrange of the first variable weight for September (WT SEP) is as shown below:
Sample Size, n: 67
Mean: 65.0597
Median: 64
Midrange: 69.5
The range, variance, and standard deviation of the of the variable weight for September (WT SEP) is as shown below:
Variance, s^2: 127.36
Standard Deviation, s: 11.28539
Range: 55
To check for the outliers in the variable weight for September ( (WT SEP), a modified box plot was used. Outliers are observed values in a dataset that are distant from other observations. A value beyond the minimum or the maximum in a dataset is considered to be an outlier. From the dataset, two values were identified as outliers as shown in the figure below:
These values are 97 and 94.
The mean, median and midrange of the second variable weight in September (WT SEP) is as shown below:
Sample Size, n: 67
Mean: 22.03
Median: 21.73
Midrange: 25.825
The range, variance, and standard deviation of the of the variable weight in September (WT SEP) is as shown below:
Variance, s^2: 10.94883
Standard Deviation, s: 3.308901
Range: 21.49
The modified box plot below shows that there are three outliers in the second variable (BMI SEPT);
The observations that are outliers for this variable are 30.57, 30.26 and 28.59. These are observations that are distant from other observations.
Five number summaries are: The minimum value, the 1 st quartile (Q1), the 2 nd quartile (Q2), the 3 rd quartile(Q3) and the maximum. The minimum is the lowest observation from the observed values while the maximum is the highest observation of the observed values. The 2 nd Quartile (median) is the value at the center when the observations of the variable are in an ascending order. On the hand, the 1 st quartile is the median of the lower half of the dataset while the 3 rd quartile is the median of the upper half of the dataset. Usually, five number summaries are used in constructing a boxplot. For the two variables, WT SEP and BMI SEP are shown below:
Five number summaries - Weight for September
Minimum: 42
1st Quartile: 56
2nd Quartile: 64
3rd Quartile: 71
Maximum: 97
Five number summaries - BMI for September
Minimum: 15.08
1st Quartile: 19.78
2nd Quartile: 21.73
3rd Quartile: 23.3
Maximum: 36.57
Descriptive Statistics using Boxplots and Histograms
Boxplot of Weight in September
Box plot for BMI SEPT
Histogram of WT SEP
Histogram of BMI SEP
Discussion of Results
The measures of the center in this report are explained by the mean and the median. The mean and median for the variable WT SEP are 65.0597 and 64 respectively while the mean and median for the variable BMI SEP is 22.03 and 21.73 respectively. The midrange for the WT SEP is 69.5 while the midrange for BMI SEP is 25.825.
The measures of dispersion in this report are explained by the range, the variance and standard deviation. Measures of dispersion shows how well the data values are spread. The measures of dispersion for WT SEP in the order of variance, standard deviation and range are 127.36, 11.28539 and 55 respectively. The measures of dispersion for the variable BMI SEP in the order of variance, standard deviation and range are 10.94883, 3.308901 and 21.49 respectively.
The measure of rank in this report is explained by the quartiles. The Q1 and Q3 for WT SEP is 56 and 71 respectively while the Q1 and Q3 for the BMI SEP is 19.78 and 23.3.
The dataset for WT SEP variable is slightly skewed to the right. This can be observed by the boxplot and the histogram. The boxplot shows that the data is not normally distributed. The dataset for WT SEP variable is completely skewed to the right. This is well observed by the boxplot and the histogram.
Conclusion
The measure of the center helps in understanding how the data values are located in the dataset. The measure of center calculations included the midrange (the average of the minimum and the maximum), the mean (average) the median (2nd quartile).
Variability shows how spread out the values are by stating the intervals. The variance is obtained by getting the difference between observed value and the mean. Spread of the data had
The shape of the distribution helps to explain if the data is normally distributed or not. Data which lacks normality is skewed. The WT SEP variable and BMI SEP variable both have right skewness. The shape of the distribution is affected by the outliers. WT SEP variable and BMI SEP variable have outliers which lead to skewness. Right skewness was observed because the outliers were beyond the outer fence in the maximum side of the boxplot.
The challenge with understanding the two datasets is identifying the outliers. Producing a usual boxplot doesn’t show outliers unless a modified one is produced.
References
Mendenhall, W. M., Sincich, T. L., & Boudreau, N. S. (2016). Statistics for Engineering and the Sciences, Student Solutions Manual . Chapman and Hall/CRC.