In statistics, variability is used to describe and summarize the extent to which a data set is dispersed from the mean or average and also the extent to which the set of data points are different from one another. However, it can be noted that a good representation of the values by the median and mean in the original dataset depends on the variability of the original data. A higher dispersion of a dataset is one which contains values which are considered to be higher or lower compared to the mean value. There are a number of methods used to calculate variability, such as range, interquartile range, variance and standard deviation. The range is used to determine the difference between high and low values in a dataset. It is rarely used because it only considers two values. The inter-quartile range is used to measure the range of the center fiftieth percentage of a score in a dataset. Variance, on the other hand, is the average squared difference of the values from the mean. However, the focus of this paper is to analyze variability with the aim of understanding standard deviation and its implication on the variability of a dataset.
The standard deviation (SD) is used to measure and summarize the extent by which a set of value within a dataset differs from the mean. Alternatively, it used to indicate how close the values in a given dataset are spread around the value of the mean. It is the most common and widely used method for the calculation of dispersion. This is because it takes into consideration every variable in the data set unlike the range and the inter-quartile range. A small standard deviation reflects that the values in a data set are tightly close together, whereas a large standard deviation shows that the values are spread apart from the mean. The presentation of the standard deviation is always the same as the mean with their measurement expressed in same units (McPherson, 2001). It is common in datasets for the values to deviate from the mean and this can be attributed to probability with such dataset considered to have a normal distribution. A statistical data that exhibit normal distribution have more of its values bunched around the mean with fewer of its values being very high or low.
Delegate your assignment to our experts and they will do the rest.
Data sets that exhibit a normal distribution, its standard deviation can be calculated to define the proportions of values that are found within a given range of the value of the mean. These distributions have the tendency of having 68 percent of its values being less than 1 standard deviation (SD) away from the mean value, 95 percent of values less than 2 SD and 99 percent of values are less than 3 SD away from the mean value (McBurney, & White, 2010). The information can be represented as shown in the figure below.
Standard deviation is calculated by the use of two formulas depending on the information of the data set. The information can be either of the values of the whole population or the values of a sample of the entire population. For example, if all the students who use the library were asked about the number of books they borrowed last week then it can be said that the whole population has been considered because all the students have been asked. Thus, the population standard deviation should be used in the calculation. On the other hand, it might be problematic to ask the entire student on their library use, and thus a sample of 100 students might be used to estimate the library use routine of the whole population of students. This case will present the use of sample standard deviation in the calculation.
The standard deviation (SD) of the whole population referred as σ is calculated using the formula below:
Where x – each value in the population, µ- mean value of the population, ⅀- summation, and N- number of values in the population. While the formula used to calculate the standard deviation of a sample population is shown below.
Where x- each value in the population, x- is the mean value of the sample, ⅀- summation, and n – 1 is the sample values minus 1. The two formulas above can be broken down for easy comprehension. The steps involved including, first find the mean of a given set of data; the second step is to find the square of each data point distance from the mean. The third step involves summing up the values obtained in step two. The fourth step is to divide by the data point's number, and the last step is taking the square root.
In conclusion, variability can be measured by the use of the range, inter-quartile range, variance and standard deviation. The simplest method of measuring variability in a given set of data is the range, though this method is very misleading if there exist extreme values in the dataset. This problem is reduced by the use of inter-quartile range because it considers the variability within the middle 50 percent of the dataset. The efficient and most widely used is the standard deviation in the calculation of variability. This is because it considers all the values in the dataset and how they vary from the mean. However, a keen observation should be put in mind while calculating the standard deviation. It should be considered whether the whole population or a sample of the population is being analyzed, this will help to determine the appropriate formula to use.
References
McBurney, D., & White, T. L. (2010). Research methods . Belmont, CA: Wadsworth Cengage Learning.
McPherson, G. (2001). Applying and interpreting statistics: A comprehensive guide . New York, NY [u.a.: Springer.