29 Nov 2022

63

What is Raw Data and How Can You Use It?

Format: APA

Academic level: University

Paper type: Coursework

Words: 792

Pages: 2

Downloads: 0

Raw data is difficult to interpret or utilize. Descriptive statistics provide an opportunity for a researcher to summarize and present data in a format that is easy to interpret (Ali & Bhaskar, 2016). Descriptive statistics are categorized into measures of central tendency and spread. The measures of central tendency focus on estimating the central position of the data and include mode, median, and mean (Manikandan, 2011) . On the other hand, the measures of dispersion focus on estimating the deviation of data points from the measures of central tendency and the existence of outliers (Ali & Bhaskar, 2016). Examples of the measures of central tendency include standard deviation, percentiles, variance, quartiles, interquartile range, and range. When evaluating multiple variables, correlation coefficients are essential in describing how variables are related ( Benesty et al., 2009) . The paper utilizes measures of central tendency, z-score, and correlation to understand how major shopping areas in the community of Springdale fit into the shopping activities of local residents. 

Table 1 : Descriptive Statistics 

Variable 

18 

19 

20 

21 

22 

23 

24 

25 

  IMPEXCH  IMPQUALI  IMPPRICE  IMPVARIE  IMPHELP  IMPHOURS  IMPCLEAN  IMPBARGN 
Mean 

4.9933 

5.7333 

5.7000 

5.2067 

4.8533 

4.8800 

4.7533 

5.2400 

Standard Error 

0.1645 

0.1434 

0.1525 

0.1469 

0.1587 

0.1480 

0.1554 

0.1383 

Median 

Mode 

Standard Deviation 

2.0150 

1.7558 

1.8672 

1.7997 

1.9435 

1.8132 

1.9035 

1.6936 

Sample Variance 

4.0604 

3.0828 

3.4866 

3.2389 

3.7770 

3.2875 

3.6233 

2.8682 

Kurtosis 

-0.8002 

1.0106 

0.4618 

-0.2588 

-0.7060 

-0.6839 

-1.0102 

0.0078 

Skewness 

-0.6692 

-1.4172 

-1.2970 

-0.8677 

-0.6134 

-0.5996 

-0.4332 

-0.9127 

Range 

It’s time to jumpstart your paper!

Delegate your assignment to our experts and they will do the rest.

Get custom essay

Table 2 : Five Number Summary 

Variable 

18 

19 

20 

21 

22 

23 

24 

25 

  IMPEXCH  IMPQUALI  IMPPRICE  IMPVARIE  IMPHELP  IMPHOURS  IMPCLEAN  IMPBARGN 
Minimum 

Q1 

3.25 

Q2 

Q3 

Maximum 

Table 3 : Maximum and Minimum Z scores for each variable 

Z-score  IMPEXCH  IMPQUALI  IMPPRICE  IMPVARIE  IMPHELP  IMPHOURS  IMPCLEAN  IMPBARGN 
Maximum 

0.9958 

0.7214 

0.6962 

0.9965 

1.1046 

1.1692 

1.1803 

1.0392 

Minimum 

-1.9818 

-2.6959 

-2.5171 

-2.3374 

-1.9827 

-2.1399 

-1.9718 

-2.5036 

Before proceeding to analyze data, it is important to identify and eliminate outliers. Outliers are unusual data points that are significantly far from the others and have the ability to distort the outcome of data analysis (Kannan et al., 2015, p.231). The use of z-score is one way of estimating outliers. Z-score measures the number of standard deviations that a data point is above or below the mean (p.232). Z-scores above + 3 or below -3 are considered to be extreme, meaning that data points that are more than three standard deviations above or below the mean are outliers. Since no variable has a maximum z score of above +3 or a minimum of below -3, we can conclude that there exist no outliers for all the variables (Table 3). The formula for calculating z-score is; 

Z score =   where x is the data point,  is the mean and s is the standard deviation (Kannan et al., 2015). 

The variables can be arranged in the order of importance based on the measures of central tendency: mean, mode, and median. The mean estimate that central part as the average of the data points. Since averaging involves summing up all data points and dividing the quotient by the number of data points, the size of the mean is largely influenced by outliers if present (Ali & Bhaskar, 2016). The median is more appropriate when outliers are present, or there exist undetermined values because it estimates the most central position after organizing data points in ascending order (Manikandan, 2011). When using a nominal scale, the mode is more preferable if definitive (Manikandan, 2011). Since there exist no outliers (Table 3) and the mode is not definitive (Table 1), the mean is the most appropriate method of ranking. Arranging the means in descending order, we order the variables from the most to the least significant as Quality of goods, low prices, a lot of bargain sales, good variety of sizes/styles, easy to return/exchange goods, convenient shopping hours, sales staff helpful/friendly and clean stores, and surroundings. 

Table 4 : Correlation between variable 19 and variables 21-25 

 

IMPQUALI 

IMPQUALI 

IMPVARIE 

0.283052 

IMPHELP 

0.204814 

IMPHOURS 

0.306109 

IMPCLEAN 

0.253291 

IMPBARGN 

0.258657 

The correlation coefficient measures the strength of the relationship between variables. Correlation coefficients fall within the range (Benesty et al., 2009, p.37). A negative correlation coefficient indicates the presence of an inverse relationship, while a positive correlation indicates a direct relationship. An inverse relationship means that variables move in different directions, while a direct relationship implies that they move in the same direction. The closer a coefficient is to -1, the more the strength of the negative relationship, while closeness to + 1 indicates a strong positive correlation (p.38). Table 4 indicates that the quality of goods is weakly and positively correlated with a variety of sizes/styles, sales staff helpful/friendly, convenient shopping hours, clean stores and surroundings, and a lot of bargain sales. This means that a variety of sizes/styles, sales staff helpful/friendly, convenient shopping hours, clean stores and surroundings, and a lot of bargain sales improve the quality of goods. 

In conclusion, the results obtained from the analysis of Springdale consumer data is free from outliers and reliable. Based on the ranking of the means, quality of goods is the most significant attribute considered by consumers, followed by low prices, a lot of bargain sales, good variety of sizes/styles, easy to return/exchange goods, convenient shopping hours, sales staff helpful/friendly and clean stores and surroundings is the least important. A positive correlation indicates that a variety of sizes/styles, sales staff helpful/friendly, convenient shopping hours, clean stores and surroundings, and a lot of bargain sales improve the quality of goods. 

References 

Ali, Z., & Bhaskar, S. B. (2016). Basic statistical tools in research and data analysis. Indian journal of anaesthesia. 

Benesty, J., Chen, J., Huang, Y., & Cohen, I. (2009). Pearson correlation coefficient. In Noise reduction in speech processing (pp. 1-4). Springer, Berlin, Heidelberg. 

Kannan, K. S., Manoj, K., & Arumugam, S. (2015). Labeling methods for identifying outliers. International Journal of Statistics and Systems, 10(2), 231-238. 

Manikandan, S. (2011). Measures of central tendency: Median and mode. Journal of pharmacology and pharmacotherapeutics. 

Illustration
Cite this page

Select style:

Reference

StudyBounty. (2023, September 15). What is Raw Data and How Can You Use It?.
https://studybounty.com/what-is-raw-data-and-how-can-you-use-it-coursework

illustration

Related essays

We post free essay examples for college on a regular basis. Stay in the know!

17 Sep 2023
Statistics

Scatter Diagram: How to Create a Scatter Plot in Excel

Trends in statistical data are interpreted using scatter diagrams. A scatter diagram presents each data point in two coordinates. The first point of data representation is done in correlation to the x-axis while the...

Words: 317

Pages: 2

Views: 186

17 Sep 2023
Statistics

Calculating and Reporting Healthcare Statistics

10\. The denominator is usually calculated using the formula: No. of available beds x No. of days 50 bed x 1 day =50 11\. Percentage Occupancy is calculated as: = =86.0% 12\. Percentage Occupancy is calculated...

Words: 133

Pages: 1

Views: 150

17 Sep 2023
Statistics

Survival Rate for COVID-19 Patients: A Comparative Analysis

Null: There is no difference in the survival rate of COVID-19 patients in tropical countries compared to temperate countries. Alternative: There is a difference in the survival rate of COVID-19 patients in tropical...

Words: 255

Pages: 1

Views: 250

17 Sep 2023
Statistics

5 Types of Regression Models You Should Know

Theobald et al. (2019) explore the appropriateness of various types of regression models. Despite the importance of regression in testing hypotheses, the authors were concerned that linear regression is used without...

Words: 543

Pages: 2

Views: 174

17 Sep 2023
Statistics

The Motion Picture Industry - A Comprehensive Overview

The motion picture industry is among some of the best performing industries in the country. Having over fifty major films produced each year with different performances, it is necessary to determine the success of a...

Words: 464

Pages: 2

Views: 85

17 Sep 2023
Statistics

Spearman's Rank Correlation Coefficient (Spearman's Rho)

The Spearman’s rank coefficient, sometimes called Spearman’s rho is widely used in statistics. It is a nonparametric concept used to measure statistical dependence between two variables. It employs the use of a...

Words: 590

Pages: 2

Views: 308

illustration

Running out of time?

Entrust your assignment to proficient writers and receive TOP-quality paper before the deadline is over.

Illustration