Introduction
Sports ranks as a critical societal endeavor. Besides the various sports, Baseball is an important recognized sport, a community´s pride, and other socio-economic benefits. In Baseball games, winning is pivotal. It indicates numerous practice hours coming to fruition, is critical in boosting team´s and personal confidence, and is economically beneficial to owners. And since winning forms as a vital measure of validation, this study aims to inquire the contribution of earned average runs (ERA) to the subsequently reported wins (W) by the respective teams and help make data-supported conclusions for positive implications.
Research Question
Does a pitcher’s ERA predict the number of wins the team has?
By definition “ Earned run average represents the number of earned runs a pitcher allows per nine innings ” (Conor, 2019). As part of this research question, this report will focus on the hypothesis below, which will be proved through inferential statistics;
Delegate your assignment to our experts and they will do the rest.
H 0 : A pitcher’s ERA does not predict the number of wins the team has ( Null hypothesis )
H a : A pitcher’s ERA predicts the number of wins the team has ( Alternate hypothesis )
Study Design
The sample population comprised 463 players, with the interest variables being wins recorded and Earned Run Average (ERA) associated for each pitching player. There was a reliance on secondary data collection of the various data collection tools, as the data was primarily online, obtained from Fan Graphs, and the statistics ranging from 1-1-2010 to 12-18-2020. With a need in attaining specific outcomes in this research, the sampling procedure was primarily purposeful sampling, which in its adoption, focuses on ensuring there is increasing researcher´s understanding by relying on samples that give the best opportunity for extensive learning (Merriam, 2015). In this case, with data from Fan Graphs available and covering a comprehensive sample ( N= 463), it was sufficient in meeting the research aims.
Exploring the Data
In this study, the main interest variables encompass wins (W) and earned run average (ERA), with data extracted from Fan Graphs, that comprised other additional data. Using Excel´s “Descriptive Statistics” tool, the following tabulation shows the data summarization.
Table 1 Descriptive Statistics for Wins (W) and Earned Run Average (ERA)
W |
ERA |
||
Mean |
41.24190065 |
Mean |
3.944406048 |
Standard Error |
1.298628738 |
Standard Error |
0.029282767 |
Median |
32 |
Median |
3.98 |
Mode |
32 |
Mode |
3.98 |
Standard Deviation |
27.94315919 |
Standard Deviation |
0.630090029 |
Sample Variance |
780.8201453 |
Sample Variance |
0.397013445 |
Kurtosis |
3.342755718 |
Kurtosis |
-0.127024955 |
Skewness |
1.698960955 |
Skewness |
-0.085807499 |
Range |
158 |
Range |
3.43 |
Minimum |
8 |
Minimum |
2.17 |
Maximum |
166 |
Maximum |
5.6 |
Sum |
19095 |
Sum |
1826.26 |
Count |
463 |
Count |
463 |
Confidence Level(95.0%) |
2.551950943 |
Confidence Level(95.0%) |
0.057543917 |
Table 1 offers useful statistics for the sampled players ( N= 463) from 1-1-2010 to 12-18-2020 from Fan Graphs. On wins (W), it's seen that the players, the recorded median wins is 32 ( M= 41.24, SD= 27.94), while during the same period, the range wins were 158, with a minimum of 8 wins, and a maximum of 166 wins. And with a Skewness statistic of 1.698960955 for the wins (W), this depicts the data as not bell-shaped, i.e., for wins, there is no normal distribution, as shown with the box plot below (Figure 1), with the median (X) more on the top whisker. Also, from the Figure 1 boxplot, there are some outliers in the dataset from 1-1-2010 to 18-12-2020.
Figure 1 Box plot for Wins (W) from 2010-2020
On ERA, the median recorded value is 3.98 ( M= 3.94, SD= 0.63), covering a variability depicted by the recorded Range of 3.43, encompassing 2.17 (minimum) and 5.6158 (maximum). Based on the data, one can deduce if there is normal distribution by deciphering information from the shown skewness statistics, i.e., -0.085807499. This value is almost 0, depicting normal distribution than seen in wins (W). The box plot below (Figure 2) shows this, as the median (X) value is lying in the middle, with no noticeable outliers in the dataset.
Figure 2 Box plot for ERA from 2010-2020
Results
In successfully answering this study´s research question ( Does a pitcher’s ERA predict the number of wins the team has ?) and offering conclusions on the listed hypothesis, regression analysis was applied. In making significant conclusions, defining one´s reference alpha is vital (Salkind, 2016), of which, in this study, α=005, was adopted in this in-depth analysis.
Table 2 Summary Excel Output
SUMMARY OUTPUT | ||||||||
Regression Statistics |
||||||||
Multiple R |
0.133128 |
|||||||
R Square |
0.017723 |
|||||||
Adjusted R Square |
0.015592 |
|||||||
Standard Error |
27.72445 |
|||||||
Observations |
463 |
|||||||
ANOVA | ||||||||
df |
SS |
MS |
F |
Significance F |
||||
Regression |
1 |
6393.406 |
6393.406 |
8.317759 |
0.00411 |
|||
Residual |
461 |
354345.5 |
768.6453 |
|||||
Total |
462 |
360738.9 |
||||||
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
Lower 95.0% |
Upper 95.0% |
|
Intercept |
64.52947 |
8.176754 |
7.89182 |
2.19E-14 |
48.46114 |
80.5978 |
48.46114 |
80.5978 |
ERA |
-5.90395 |
2.047102 |
-2.88405 |
0.00411 |
-9.92676 |
-1.88114 |
-9.92676 |
-1.88114 |
With a predefined alpha as a tool in concluding either to reject or accept a previously defined H0, if it is lesser or greater than the value (Salkind, 2016), in this case, the result is “reject H0”, as the Significance F value (0.00411) is way smaller compared to our 0.05. Using the above summary (Table 2), its conclusive from the model that the effect of ERA on Wins is statistically significant, ( F (1, 461) = 8.317759, p < 0.00411, R 2 = 0.017723). Likewise, ERA as the predictor variable in this case is significant in predicting Wins (W) ( t = -2.88405, p = 0.00411). Also, the scatter plot is an essential tool helpful in depicting association (Salkind, 2016), of which, as shown in Figure 3, the association is negative.
Figure 3 Scatterplot of Wins (W) and ERA for data from 1-1-2010 to 12-18-2020
As shown in Figure 3, Wins (W) and the corresponding ERA values are negatively correlated, with their equation as;
y = -5.9039x + 64.529 Conclusion
The analysis helps answer the research question, as proved through regression analysis's statistical significance values. When focused on wins (W), the attained ERA significantly predicts the baseball teams' resulting wins. These results align or agree with that found by Conor (2019). As a means of increasing wins, teams need to work towards minimizing their ERA, as the association is negative.
Further Study
First, this study only focused on ERA and its implication on wins (W), yet other essential variables exist and applying multiple regression could have been better. This is because there was only one independent variable in this study
Secondly, ERA is not the only aspect that impacts winnings in a game of Baseball. Based on the collected data, there are other variables. Hence, further studies can examine additional variables, e.g., fielding, batting measures, wins-above-replacement (WAR), Exit velocities (EV), among other baseball statistics. Through this process, better findings can be shown and help improve decisions.
References
Conor, W. (2019). Batting, Pitching, or Fielding: What’s Most Important in Today’s MLB? Sanford University. https://www.samford.edu/sports-analytics/fans/2019/Batting-Pitching-or-Fielding-Whats-Most-Important-in-Todays-MLB
Fan Graphs (2020). Data: Leaderboards. https://www.fangraphs.com/leaders.aspx?pos=all&stats=pit&lg=all&qual=y&type=8&season=2020&month=0&season1=2010&ind=0&team=0&rost=0&age=0&filter=&players=0&startdate=2010-01-01&enddate=2020-12-31&sort=2,d
Merriam, S. B. (2015). Qualitative research: A guide to design and implementation (4th ed.). San Francisco, CA: Jossey-Bass Publishers.
Top of Form
Bottom of Form
Salkind, N. J. (2016). Statistics for people who (think they) hate statistics . Thousand Oaks: SAGE Publications.