Central tendency is a branch of descriptive statistics that can be used to provide a comprehensive summary of a whole dataset. There are three measures of central tendency; the mean (average), median, and mode. The mean is the arithmetic average, and it can be used with both discrete and continuous data (Chen, 2017). Calculating the mean is very simple. The mean () is calculated by dividing the sum of values in a dataset by the number of values. i.e.,
In excel, the mean can be calculated using the AVERAGE function (Chen, 2017) . The dataset in Table 1 (excel file) was chosen when calculating the measures of central tendency and measures of variation. The table shows sample data for pre-pack meat preparation time. More specifically, it shows time to prepare 0.5 kg pack of mince, time to prepare a 0.25 kg pack of sausage, and time to prepare stuffed pork steak. The average (mean) preparation time for mince, sausage, and stuffed pork steak are shown in the table below. The average time for preparing minced meat, sausage, and stuffed pork steak are 1.64, 1.72, and 3.82 minutes, respectively. Based on the calculations, an individual who wants to prepare minced meat, sausage, and stuffed pork steak should expect to take approximately 1.64 minutes, 1.72 minutes, and 3.82 minutes, respectively. Comparing the three datasets, it takes less time to prepare Mince than sausage and stuffed pork steak. Also, it takes less time to prepare sausage than pork steak. A butcher needs to know the time required to prepare food as it makes it easier to estimate delivery time.
Delegate your assignment to our experts and they will do the rest.
The second measure of central tendency that was considered is the median. The median is the middle value in a dataset (Chen, 2017) . Before determining the median, it is required to order the dataset logically so that the observations make sense. One can order the dataset in ascending order or descending order. Once the data is properly ordered, the middle value can be determined. In excel, the median was calculated using the MEDIAN function (Chen, 2017) . The median for the three data sets, mince, sausage, and stuffed pork steak, was found to be 1.63, 1.77, and 3.78, respectively. The mean provides a helpful measure of the center of the dataset. Since the mean and the median are different, the data are “skewed.” The median informs capacity planning, and if the median’s deviation from the mean is large, it means that the butcher would not be able to estimate the time required to prepare the menus. From the calculations, there is a small deviation between the mean and the median, and this means that the butcher can estimate the time it would take to prepare mince, sausage, and stuffed pork steak. Knowing the mean and median time would help the butcher know how much time is needed to prepare the three menus.
Time to prepare Mince 0.5kg pack in (Mins) | Times to Prepare Sausage 0.25kg in (Mins) | Time to Prepare Stuffed Pork Steak in (Mins) | |
Mean |
1.64 |
1.72 |
3.82 |
Median |
1.63 |
1.77 |
3.78 |
Mode |
1.85 |
1.68 |
3.78 |
Table 1: Measures of Central Tendency
Measures of Variation for Continuous Variable Sample Date
Measures of variability are statistical measures that are used to represent the degree of variation in a given set of data (Chen, 2017) . In other words, they represent how spread out the values are. Just like the central tendency, there are many measures of variation. In this paper, three measures of variation will be explored. These are the range, variance, and standard deviation. Each of these measures will be discussed and calculated using the dataset provided.
The range is the most straight forward measure of variability. It is based on the two most extreme values. The range is the difference between the largest and the smallest values in a dataset (Chen, 2017) . The statistical measure is susceptible to outliers since it is based on two extreme values. In excel, the range is obtained by first determining the minimum value and the maximum value using the MIN and MAX function. The difference between these two values is the range of the dataset. The range for the three data set given in Table 1 in the excel file provided was found to be 1.5, 0.96, and 3.31 minutes for mince, sausage, and stuffed pork steak, respectively. A butcher needs to know the time range in order to understand the variability of preparation time. If the range is large, it indicates that there is a wider variation in pre-pack meat preparation time. Knowing the range would enable the butcher to determine the factors resulting in the wider variations in order to reduce the time taken to prepare a menu.
The second measure of variation that will be explored is variance. Variance is the average squared mean difference of the values from the mean (Chen, 2017) . Unlike the range, variance includes all values in the dataset. It is easy to calculate variance. First, the difference between the set of values and the mean is calculated, squared, and then summed. The sum is then divided by the number of values in the dataset. The value obtained is the variance. The equation below is the formula used to calculate variance.
In excel, the VAR.S function is used to calculate variance. For the dataset given in Table 1 in the excel file, the variance is calculated, and the results are shown in Table 2. The variance was found to be 0.11, 0.07, and 0.31 for mince, sausage, and stuffed pork steak, respectively. The results indicate that there is lesser variability the time taken to prepare the different menus.
The standard deviation is the most widely used measure of variability. Technically, it obtained by finding the square root of the variance and denoted by sigma, σ.
The standard deviation indicates the deviation from the mean (Chen, 2017) . The standard deviation for the dataset given in Table 1 in the excel file was calculated. The standard deviation of the preparation time for pre-packed meat was found to be 0.33, 0.26, and 0.56 for mince, sausage, and stuffed pork steak, respectively. A standard deviation of 0.33 for preparing minced meat indicates that typical preparation time for the mincemeat is plus or minus 0.33 from the mean. For sausage is it plus or minus 0.26 from the mean and or minus 0.56 from the mean for stuffed pork steak. The table below shows the standard deviation, variance, and the range for the three datasets given in Table 1 in the excel sheet.
Time to prepare Mince 0.5kg pack in (Mins) | Times to Prepare Sausage 0.25kg in (Mins) | Time to Prepare Stuffed Pork Steak in (Mins) | |
Standard Deviation |
0.33 |
0.26 |
0.56 |
Variance |
0.11 |
0.07 |
0.31 |
Minimum |
0.95 |
1.22 |
2.29 |
Maximum |
2.45 |
2.18 |
5.6 |
Range |
1.5 |
0.96 |
3.31 |
Table 1: Measures of Variability
Explain what your Discrete Random Variable is and what probability distribution can be used to describe it. Give some examples and interpretation of the usage of the probability distribution function and outline its business relevance.
A discrete variable is a variable whose value is obtained through measuring (Chen, 2017). A discrete random variable has a countable number of possible values. The discrete random variable chosen for this exercise is the time it takes to prepare a given menu as given in Table 1 in the excel file. Table 1 (excel file) shows sample data for pre-pack meat preparation time. This data represents the time takes for the butcher to prepare minced meat, sausages, and stuffed pork steak. Discrete random variables can be modeled using a variety of discrete probability distributions.
A probability distribution is a function that describes the probability or likelihood of obtaining possible values that a random variable can assume (Chen, 2017). The choice of probability distribution depends on the type of data. For the dataset chosen, a normal distribution can be used to describe it. A normal distribution can be used to describe many natural phenomena, such as average weight of girls, time is taken to do a task, measure people’s IQ. Since the discrete random variable chosen fall within this range, a normal distribution can be used to describe it. Decision makers can use the normal distribution to describe uncertain variables, such as the price of products and inflation rate.
The standard normal distribution has two parameters: the mean and the standard deviation (Chen, 2017). With these two parameters and the z-distribution, a normal probability distribution can be graphed. Normal distribution for the dataset given in table 1 (minced meat) is shown graphically, as shown below. The graph shows the sample data of the preparation time for minced meat is normally distributed.
Figure 1: Normal Distribution-Minced Meat Preparation Time
Either do 3.a or 3.b
Calculate and interpret a confidence interval estimate of the Continuous Random Variable. Choose an appropriate confidence level for your interval.
A confidence interval (CI) is a statistical measure used to measure the uncertainty with a particular statistic. It represents the range of values within which the true value lies in. They are constructed at a given level, such as 90%, 95%, or 98%. A confidence interval measure at 95% confidence level means if the sample population is sample multiple times and interval estimates are made from the multiple samples, the resulting interval would bracket the true population parameter in approximately 95% of the cases. For large samples, the following formula can be used to generate 95% confidence interval.
Where CI is the 95% Confidence Interval, X is the mean t is the z value for 95% CI which is 1.96, s is the standard deviation, and n is the number of samples.
95% CI for Preparing Minced Meat
95% CI for Preparing Sausage
95% CI for Preparing Stuffed Pork Steak
These values can then be used to construct a confidence interval using the excel tool.
Set up and conduct a hypothesis test relevant to the continuous random variable. Choose an appropriate level of significance for your test.
Show how Regression techniques can be used to extrapolate or interpolate values for your dependent variable.
Regression analysis is a statistical method used to determine the relationship between two or more variables. This method provides detailed insight that can be used to improve products and services. An Individual first have to gather data on the variables in question in order to conduct a regression analysis. The data is first plotted on excel to figure out the relationship between them. For the dataset given in Table 1 in the excel file provided, the independent variable is the staff ID while the dependent variable is the time required to prepare three different menus, which are minced meat, sausage, and stuffed pork steak.
After establishing the relationship between the variables, the relationship can be used to construct new data points. Two methods that can be used to construct a new data point is interpolation and extrapolation method. Interpolation is a method used to construct new data points within the range of a discrete set of known data points. On the other hand, extrapolation is a method used to estimate new data points beyond the original set of known data points. These two methods are similar, but extrapolation is subject to greater uncertainty as well as a higher risk of producing meaningless results. Since the independent variable is categorical, the Staff ID will have to be assigned a numerical ID before performing thee regression analysis. The table below shows the summary output for the regression analysis.
SUMMARY OUTPUT | ||||||||
Regression Statistics |
||||||||
Multiple R |
0.198149918 |
|||||||
R Square |
0.03926339 |
|||||||
Adjusted R Square |
0.019248044 |
|||||||
Standard Error |
1.023329324 |
|||||||
Observations |
50 |
|||||||
ANOVA | ||||||||
df |
SS |
MS |
F |
Significance F |
||||
Regression |
1 |
2.054261 |
2.054261 |
1.961664 |
0.167767022 |
|||
Residual |
48 |
50.26574 |
1.047203 |
|||||
Total |
49 |
52.32 |
||||||
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
Lower 95.0% |
Upper 95.0% |
|
Intercept |
3.465875544 |
0.746618 |
4.642103 |
2.7E-05 |
1.964700299 |
4.967051 |
1.9647 |
4.967051 |
X Variable 1 |
-0.627370073 |
0.447931 |
-1.40059 |
0.167767 |
-1.527996401 |
0.273256 |
-1.528 |
0.273256 |
Table 3: Regression Analysis Summary Output.
Conclusion
There are different measures of central tendency and measure of variability that can be used to analyse datasets. In this paper, the mean, median, variance, standard deviation, and the range were explored and discussed. These tools are very useful when analyzing data to be used to make informed decisions. Measures of central tendency describe the nature of data, while measures of variability describe the degree of variation between variables. The paper further explored the use of a confidence interval and its statistical significance. This method is useful in finding the range between which given data points become relevant. Lastly, the paper provided a summary of the regression analysis.
References
Chen, F. (2017). Measures of central tendency & variability + normal distribution. [Online]. Retrieved from: http://faculty.ecnu.edu.cn/picture/article/220/87/47/75140ffb4236b6622d6e6c3deb4c/37ba4b4a-73ff-4873-8cfd-10f82e3a22c1.pdf . Accessed 14 th July 2019.