Raw data is difficult to interpret or utilize. Descriptive statistics provide an opportunity for a researcher to summarize and present data in a format that is easy to interpret (Ali & Bhaskar, 2016). Descriptive statistics are categorized into measures of central tendency and spread. The measures of central tendency focus on estimating the central position of the data and include mode, median, and mean (Manikandan, 2011) . On the other hand, the measures of dispersion focus on estimating the deviation of data points from the measures of central tendency and the existence of outliers (Ali & Bhaskar, 2016). Examples of the measures of central tendency include standard deviation, percentiles, variance, quartiles, interquartile range, and range. When evaluating multiple variables, correlation coefficients are essential in describing how variables are related ( Benesty et al., 2009) . The paper utilizes measures of central tendency, z-score, and correlation to understand how major shopping areas in the community of Springdale fit into the shopping activities of local residents.
Table 1 : Descriptive Statistics
Variable |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
25 |
IMPEXCH | IMPQUALI | IMPPRICE | IMPVARIE | IMPHELP | IMPHOURS | IMPCLEAN | IMPBARGN | |
Mean |
4.9933 |
5.7333 |
5.7000 |
5.2067 |
4.8533 |
4.8800 |
4.7533 |
5.2400 |
Standard Error |
0.1645 |
0.1434 |
0.1525 |
0.1469 |
0.1587 |
0.1480 |
0.1554 |
0.1383 |
Median |
5 |
7 |
7 |
6 |
5 |
5 |
5 |
6 |
Mode |
7 |
7 |
7 |
7 |
7 |
6 |
7 |
7 |
Standard Deviation |
2.0150 |
1.7558 |
1.8672 |
1.7997 |
1.9435 |
1.8132 |
1.9035 |
1.6936 |
Sample Variance |
4.0604 |
3.0828 |
3.4866 |
3.2389 |
3.7770 |
3.2875 |
3.6233 |
2.8682 |
Kurtosis |
-0.8002 |
1.0106 |
0.4618 |
-0.2588 |
-0.7060 |
-0.6839 |
-1.0102 |
0.0078 |
Skewness |
-0.6692 |
-1.4172 |
-1.2970 |
-0.8677 |
-0.6134 |
-0.5996 |
-0.4332 |
-0.9127 |
Range |
6 |
6 |
6 |
6 |
6 |
6 |
6 |
6 |
Delegate your assignment to our experts and they will do the rest.
Table 2 : Five Number Summary
Variable |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
25 |
IMPEXCH | IMPQUALI | IMPPRICE | IMPVARIE | IMPHELP | IMPHOURS | IMPCLEAN | IMPBARGN | |
Minimum |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
Q1 |
3.25 |
5 |
5 |
4 |
4 |
4 |
3 |
4 |
Q2 |
5 |
7 |
7 |
6 |
5 |
5 |
5 |
6 |
Q3 |
7 |
7 |
7 |
7 |
7 |
6 |
6 |
7 |
Maximum |
7 |
7 |
7 |
7 |
7 |
7 |
7 |
7 |
Table 3 : Maximum and Minimum Z scores for each variable
Z-score | IMPEXCH | IMPQUALI | IMPPRICE | IMPVARIE | IMPHELP | IMPHOURS | IMPCLEAN | IMPBARGN |
Maximum |
0.9958 |
0.7214 |
0.6962 |
0.9965 |
1.1046 |
1.1692 |
1.1803 |
1.0392 |
Minimum |
-1.9818 |
-2.6959 |
-2.5171 |
-2.3374 |
-1.9827 |
-2.1399 |
-1.9718 |
-2.5036 |
Before proceeding to analyze data, it is important to identify and eliminate outliers. Outliers are unusual data points that are significantly far from the others and have the ability to distort the outcome of data analysis (Kannan et al., 2015, p.231). The use of z-score is one way of estimating outliers. Z-score measures the number of standard deviations that a data point is above or below the mean (p.232). Z-scores above + 3 or below -3 are considered to be extreme, meaning that data points that are more than three standard deviations above or below the mean are outliers. Since no variable has a maximum z score of above +3 or a minimum of below -3, we can conclude that there exist no outliers for all the variables (Table 3). The formula for calculating z-score is;
Z score = where x is the data point, is the mean and s is the standard deviation (Kannan et al., 2015).
The variables can be arranged in the order of importance based on the measures of central tendency: mean, mode, and median. The mean estimate that central part as the average of the data points. Since averaging involves summing up all data points and dividing the quotient by the number of data points, the size of the mean is largely influenced by outliers if present (Ali & Bhaskar, 2016). The median is more appropriate when outliers are present, or there exist undetermined values because it estimates the most central position after organizing data points in ascending order (Manikandan, 2011). When using a nominal scale, the mode is more preferable if definitive (Manikandan, 2011). Since there exist no outliers (Table 3) and the mode is not definitive (Table 1), the mean is the most appropriate method of ranking. Arranging the means in descending order, we order the variables from the most to the least significant as Quality of goods, low prices, a lot of bargain sales, good variety of sizes/styles, easy to return/exchange goods, convenient shopping hours, sales staff helpful/friendly and clean stores, and surroundings.
Table 4 : Correlation between variable 19 and variables 21-25
IMPQUALI |
|
IMPQUALI |
1 |
IMPVARIE |
0.283052 |
IMPHELP |
0.204814 |
IMPHOURS |
0.306109 |
IMPCLEAN |
0.253291 |
IMPBARGN |
0.258657 |
The correlation coefficient measures the strength of the relationship between variables. Correlation coefficients fall within the range (Benesty et al., 2009, p.37). A negative correlation coefficient indicates the presence of an inverse relationship, while a positive correlation indicates a direct relationship. An inverse relationship means that variables move in different directions, while a direct relationship implies that they move in the same direction. The closer a coefficient is to -1, the more the strength of the negative relationship, while closeness to + 1 indicates a strong positive correlation (p.38). Table 4 indicates that the quality of goods is weakly and positively correlated with a variety of sizes/styles, sales staff helpful/friendly, convenient shopping hours, clean stores and surroundings, and a lot of bargain sales. This means that a variety of sizes/styles, sales staff helpful/friendly, convenient shopping hours, clean stores and surroundings, and a lot of bargain sales improve the quality of goods.
In conclusion, the results obtained from the analysis of Springdale consumer data is free from outliers and reliable. Based on the ranking of the means, quality of goods is the most significant attribute considered by consumers, followed by low prices, a lot of bargain sales, good variety of sizes/styles, easy to return/exchange goods, convenient shopping hours, sales staff helpful/friendly and clean stores and surroundings is the least important. A positive correlation indicates that a variety of sizes/styles, sales staff helpful/friendly, convenient shopping hours, clean stores and surroundings, and a lot of bargain sales improve the quality of goods.
References
Ali, Z., & Bhaskar, S. B. (2016). Basic statistical tools in research and data analysis. Indian journal of anaesthesia.
Benesty, J., Chen, J., Huang, Y., & Cohen, I. (2009). Pearson correlation coefficient. In Noise reduction in speech processing (pp. 1-4). Springer, Berlin, Heidelberg.
Kannan, K. S., Manoj, K., & Arumugam, S. (2015). Labeling methods for identifying outliers. International Journal of Statistics and Systems, 10(2), 231-238.
Manikandan, S. (2011). Measures of central tendency: Median and mode. Journal of pharmacology and pharmacotherapeutics.