Part 1-a-b
mango blue |
papaya red |
||
MIN |
12 |
MIN |
11 |
MAX |
23 |
MAX |
26 |
RANGE |
11 |
RANGE |
15 |
Q1 |
14.8 |
Q1 |
14 |
Q3 |
20.3 |
Q3 |
22.3 |
MEDIAN |
17 |
MEDIAN |
17.5 |
IQR |
5.5 |
IQR |
8.25 |
MEAN |
17.6 |
MEAN |
17.6 |
MODE |
17 |
MODE |
14 |
STANDARD DEVIATION |
3.52 |
STANDARD DEVIATION |
4.59 |
Part1-c
Part 1-d
Part1-e
The data provided represents average weekly sales in thousands for both Mango blue and papaya red. Daily sales of both products were collected and compiled into weekly averages for ease of analysis. Descriptive statistics for the weekly average sales for both mango blue and papaya red were determined. Histograms representing the average sales for both products were generated and combined for comparative purposes. Box plots were also generated to further enhance analysis. The purpose of the data analysis is to figure out whether there is a difference in the average weekly sales of both the Mango blue and papaya red and how the sales vary in the long run.
Delegate your assignment to our experts and they will do the rest.
The descriptive statistics give the general characteristics for both of the products. From the table, we can see that papaya red has a higher range compared to mango blue. Both products have the same mean. However, papaya red has higher standard deviation at 4.59 as compared to mango blue’s 3.52 indicating that there is a larger variation between the individual weekly average sales values and the mean for papaya red than in mango blue.
From the mango blue sales bar graph, it is possible to see that it has 3 peaks each of 23,000 worth of sales for weeks 8, 21 and 24. For the papaya red on the other hand, has one peak in week 14 with average sales for that week reaching 26,000 worth of sales for that week. From the histogram one can conclude that the variation of average weekly sales for papaya red is mostly above the average value for all the weeks and that its weekly sales average values are generally higher than those of the mango blue. It also has very low sales in some weeks which pull down its average to be same as that of the mango blue for all the 26 weeks. The box plots show the distribution of the weekly sales values for both the mango blue and papaya red. From the comparison plot, it is possible to see that the weekly sales for mango blue have a lower minimum and maximum as compared to that of papaya red. The median for the papaya red can be seen to be slightly higher than the median for mango blue. Furthermore the box plot for mango blue shows that the values are negatively skewed while that of papaya red is slightly negatively skewed.
Part2-a
From the scatter plot, the trendline indicates that there is a weak negative correlation between the dosage of the trial drug and the patients’ muscle recovery times. There is also presence of outliers such as at the point where the dosage is 27 micrograms and the recovery time is 16 minutes. This outlier can be ignored as majority of the points indicate that an increase in dosage reduces the recovery time.
Part2-b
The Pearson’s correlation r is which further supports the results of the scatter plot above. The Correl function was used in excel to find the Pearson’s correlation r between the dosage of the trial drug and the patient muscle recovery times. There is weak correlation between the dosage of the trial drug and the patient muscle recovery times therefore, further confirming that when the dosage is low the recovery time is slightly higher as compared to when the dosage is high. The Correl function is a function that is used to calculate correlation coefficient and it comes from the equation below.
Where x in this case is the recovery time in minutes and y is the dosage administered. If the value of r is closer to +1, this is an indication that there is a strong positive correlation and if it is closer to -1, then there is a strong negative correlation.
Part 2-c
The regression analysis further confirms the relationship between the dosage of the trial drug and the recovery times for patients’ muscles. From the regression analysis we can obtain the linear equation for the trendline. The equation contains the dependent value which depends on the independent value. From this equation, the gradient and the y intercept is calculated. The gradient for the trendline on the scatter plot is calculated to be -0.3189 and the y-intercept is 20.27566. This therefore means that the trendline is correct. From the residual plot we can see that the data is does fit to be described by the regression line. Since the residuals are representative of the difference between the individual points and the trendline at that specific x value, the outlier that occurs with regard to the 16 th minute of recovery time is seen on the plot. However, the rest cannot form any pattern and therefore the regression line is appropriate.
Part 2-d
The relationship between the dosage of trial drug applied in micrograms and the patient muscle recovery times is one that is negatively correlated with the increase in dosage reducing recovery times. This is further supported by the Pearson’s correlation r which is at -0.4705. From the regression analysis the linear equation comes down to . The y is the dosage in micrograms while the x is the patient muscle recovery time. The equation shows that the trendline has a negative gradient and that its y-intercept is at .
Part3-a
From the frequency distribution table, the modal class of the height of the 500 men is centimeters with 95 adult men falling in this category. A histogram and frequency polygon was developed from the frequency distribution table. The frequency polygon has a bell curve and is symetterical. From the histogram it was possible to deduce that the height of the 500 adult men was unimodal. Therefore, the fact the population is normally distributed.
Part3-b
To obtain a sample, systematic sampling method was used. Since a sample of 20 was needed from a population from 500 adult men, 500 was divided by 20 so as to find every nth number was going to be picked from the population. Therefore every 25 th value was picked from the sample of 500 and the sample was created. Systematic sampling allows for the sample to be evenly obtained from the population and on top of that it is easier to conduct as compared to a simple random sample. However, this method of sampling might interact with a hidden periodic trait within the population which could compromise the sample and make it not random (Sharma, 2017).
186.72 |
2.20 |
mean |
standard deviation |
Part3-c
95% confidence Interval
Therefore, there is a 95% chance that the mean of the population lies between and .
Part3-d
99% confidence Interval
Therefore, there is a 99% chance that the mean of the population lies between and .
The 99% confidence interval gives more accurate representation of where the mean is likely to be found with only a 1% probability that the mean is not going to be found in the interval. However, it gives a wider interval range of . For the 95% confidence interval, there is a 5% probability that the mean of the population is not likely to fall in the interval which is a higher probability than 1%. However, the interval range is narrower at .
References
Sharma, G. (2017). Pros and cons of different sampling techniques. International journal of applied research , 3 (7), 749-752.