This paper does a statistical analysis of the humidity in New York City, based on the data that was collected between 20 th December, 2019 and 29 th December, 2019. This data is represented in the table below:
Date | Humidity |
20/12/2019 | 60% |
21/12/2019 | 60% |
22/12/2019 | 66% |
23/12/2019 | 60% |
24/12/2019 | 62% |
25/12/2019 | 50% |
26/12/2019 | 82% |
27/12/2019 | 80% |
28/12/2019 | 86% |
29/12/2019 | 62% |
Delegate your assignment to our experts and they will do the rest.
Line Chart
A line chart, also referred to as a line graph, is a chart that is used to show how the value of something is changing over time. In this case, the line chart has been used to plot the change of humidity in New York City over a period of ten consecutive days (from 20 th December, 2019 to 29 th December, 2019). The line graph has two components: the vertical y-axis and the horizontal x-axis. The x-axis is the independent axis and its values are usually not dependent on anything. In this case, the x-axis presents the date, which continues to progress regardless of external factors. The y-axis is the dependent axis, and in this case, it represents the humidity levels on the different days. The points are then plotted and connected using dots to form a line. The line chart shows fluctuating trend in the humidity levels, with the highest value registered being 86% on 28 th December ("Line Graph - Everything You Need to Know about Line Graphs", 2019) .
Column Chart
The column chart is another primary chart type that is used to give a visual display in how a certain value changes over a period of time. Just like the line chart, data series in a column chart are represented vertically, only that in this case we use columns instead of a line. It is easy to compare the different column lengths, and this makes column charts an excellent way of showing change over time. The good thing about using column charts instead of a line graph is that column charts are excellent representations in cases where we have limited data points. The chart gives a better representation of how the humidity levels are fluctuating over the period of ten days, with the highest value registered being 86% on 28 th December. A column chart is simple and versatile, easy to read, and it is easy to add the needed data labels where it makes sense (ExcelJet, 2019) .
Calculations
Mean
The mean gives the average of a given set of values. To calculate the mean, all you need to do is to add up all the numbers in the set and then divide by the total numbers in the set. In other words, the mean is given by the sum of all values divided by their count (MathsisFun, 2019) . Let us now get the mean value of the humidity registered in New York City over ten days:
Add the numbers: 60 + 60 + 66 + 60 + 62 + 50 + 82 + 80 + 86 + 62 = 668
Divide by the count of numbers (in this case there are 10 numbers): 668/10 = 66.8
The mean humidity is 66.8%
Median
The statistical median gives the middle figure for a given set of numbers that are arranged in ascending or descending order. For even number of values, the median is obtained from the sum of the two center numbers. For odd number of values, the median is obtained from the value at the center of the set (AAA Math, 2019) . Let us find the median value of the humidity registered in New York City over ten days:
Arrange the numbers in the set in ascending order: 50, 60, 60, 60, 62, 62, 66, 80, 82, 86.
There is an even number of values so we get the mean of the two numbers at the center. Add the two numbers at the middle and divide by 2: (62+62)/2 = 62
The median humidity value is 62%
Mode
The mode of a data set is the value that appears the most frequently within the set. The easiest way to find the mode is to arrange the values in the set in order (ascending or descending) and then counting the number of times that each value appears. The value that appears most times is the mode (Virtual Nerd, 2019) . Let us find the mode of the humidity registered in New York City over ten days:
Arrange the numbers in the set in ascending order: 50, 60, 60, 60, 62, 62, 66, 80, 82, 86.
Value | Number of appearances |
50 | 1 |
60 | 3 |
62 | 2 |
66 | 1 |
80 | 1 |
82 | 1 |
86 | 1 |
Total | 10 |
The mode of the set is 60%, which appears three times
Range
The range of a given data set is determined from the difference of the highest and lowest values within the set (MathGoodies, 2019) . Let us find the range humidity registered in New York City over ten days:
Arrange the numbers in the set in ascending order: 50, 60, 60, 60, 62, 62, 66, 80, 82, 86.
The smallest value is 50 and the largest value is 86.
The range is given by: Highest – Lowest = 86 – 50 = 36
The range of the data set is 36.
Standard Deviation
The standard deviation of a given set of values shows how spread out the values are, and it is represented by the symbol, σ.
The formula for standard deviation is given by:
In simple words, you first work out the simple average of the data set and then for each number in the set, subtract the mean then square the result. Again, workout the mean of the squared differences then find the square root (MathisFun Advanced, 2019) . Let us find the standard deviation of the humidity registered in New York City over ten days:
Step One: Find the mean of the values
Add the numbers: 60 + 60 + 66 + 60 + 62 + 50 + 82 + 80 + 86 + 62 = 668
Divide by the count of numbers (in this case there are 10 numbers): 668/10 = 66.8
The mean humidity, μ = 66.8%
Step Two: Subtract the mean from each number and square the result
(x i - μ) 2
(60 - 66.8) 2 = (-6.8) 2 = 46.24
(60 - 66.8) 2 = (-6.8) 2 = 46.24
(66 - 66.8) 2 = (-0.8) 2 = 0.64
(62 - 66.8) 2 = (-4.8) 2 = 23.04
(50 - 66.8) 2 = (-16.8) 2 = 282.24
(82 - 66.8) 2 = (15.2) 2 = 231.04
(80 - 66.8) 2 = (-13.2) 2 = 174.2
(86 - 66.8) 2 = (19.2) 2 = 368.64
(62 - 66.8) 2 = (-4.8) 2 = 23.04
Step Three: Work out the mean of the squared values
(46.24 + 46.24 + 46.24 + 0.64 + 23.04 + 282.24 + 231.04 + 174.2 + 368.64 + 23.04) = 1241.56
(1241.56) / 10
= 124.156
Step Four: Take the Square Root of the figure Obtained
√124.156
=11.143
The Standard Deviation is 11.143
Linear Forecasting Model
y = mx +c
In this case, m is the gradient and c is the y-intercept. Since the values are fluctuating, we will take one section of the line chart and use it to determine these values. On the Excel chart, right click and select the option, ‘add trendline.’ In the trendline options, select ‘show equation on the line.’ This gives the straight line graph as
y = (0.0196*20) + 0.56
m is the slope of the line, and it shows how much the values are changing with time, and it is 0.0196.
c is the point at which the line cuts the y-axis, and it is the constant in the straight line graph, and it is 0.56.
Forecast for Day 15
x = 15
y = mx + c
y = (0.0196*15) + 0.56
= 0.854
= 85.4%
Forecast for Day 20
x = 20
y = mx + c
y = (0.0196*20) + 0.56
= 0.952
= 95.2%
Bibliography
AAA Math (2019). Median . [online] Aaamath.com. Available at: https://www.aaamath.com/g5_418x2.htm [Accessed 31 Dec. 2019].
ExcelJet (2019). Column chart | ExcelJet . [Online] Exceljet.net. Available at: https://exceljet.net/chart-type/column-chart [Accessed 31 Dec. 2019].
Line Graph - Everything You Need to Know About Line Graphs. (2019). Retrieved 31 December 2019, from https://www.smartdraw.com/line-graph/
MathGoodies (2019). The Range of a Set of Data . [online] Mathgoodies.com. Available at: https://www.mathgoodies.com/lessons/vol8/range [Accessed 31 Dec. 2019].
MathsisFun (2019). How to Calculate the Mean Value . [Online] Mathsisfun.com. Available at: https://www.mathsisfun.com/mean.html [Accessed 31 Dec. 2019].
MathisFun Advanced (2019). Standard Deviation Formulas . [online] Mathsisfun.com. Available at: https://www.mathsisfun.com/data/standard-deviation-formulas.html [Accessed 31 Dec. 2019].
Virtual Nerd (2019). How Do You Find the Mode of a Data Set? | Virtual Nerd . [online] Virtualnerd.com. Available at: https://virtualnerd.com/middle-math/probability-statistics/mean-median-mode-range/mode-data-set [Accessed 31 Dec. 2019].