Introduction
Linear regression and correlation analysis are both useful tools in determining whether there is any relationship between two sets of data and whether or not the relationship is significant. This project wished to investigate the 2016-2017 NBA season leaders’ performance using linear regression and correlation analysis. This research project seeks to establish in particular, whether there is a relationship between 3 Points field goal per game and the point per game. The research will also determine whether there is a significant relationship between conference and points per game before investigating if there is any correlation between points per game split by conference.
The variables under analysis in the dataset are Players, Teams, PTS, 3PM, Conference
Data Gathering
Data gathering is a vital step in any research before analysis can be carried out. This data was obtained from ESPN basketball website: http://www.espn.com/nba/seasonleaders/_/league/nba/page/4
Delegate your assignment to our experts and they will do the rest.
The data was first classified as quantitative data since it was numeric and also independent data as the observations were independent from each other. In order to avoid bias, a simple random sample of 99 NBA players was taken without replacement. Each individual was chosen purely on chance from a larger population with each having an equal chance of being chosen. Since the sample data obtained was more than 30 observations, the assumption of normality was made assuming the data was normally distributed with a mean of 0 and a standard deviation of 1.
First Research Question
Is there a relationship between 3 Points field goal per game and the point per game? In order to answer this question a two-sample t-test was performed as well as a side by side boxplot to display the variation in the samples as well as a linear regression analysis.
PTS | ||||
t-Test: Two-Sample Assuming Unequal Variances | ||||
Variable 1 | Variable 2 | |||
Mean | 14.79069767 | 13.26785714 | ||
Variance | 35.72372093 | 22.97422078 | ||
Observations | 44 | 56 | ||
Hypothesized Mean Difference | 0 | |||
Df | 79 | |||
t Stat | 1.366979058 | |||
P(T<=t) one-tail | 0.087754632 | |||
t Critical one-tail | 1.664371409 | |||
P(T<=t) two-tail | 0.175509265 | |||
t Critical two-tail | 1.99045021 | |||
The box plot reveals that there is an outlier in the points data (31.6) with a mean of 13.9. Running the two-sample t-test we get a p-value of 0.175509265 . This p value is greater than the alpha level of significance (.05) therefore the null hypothesis is not rejected. This implies that there is no significant difference between 3 Points field goal per game and the point per game. From the linear regression equation, the r squared value shows how close the data are fitted on the linear regression line and a value of 0.2 is low meaning the relationship between the response y and the predictor x is very weak.
Second Research Question
Is there a significant relationship between conference and points per game? To answer this question, two-Sample t-test Assuming Unequal Variances was performed and the following results obtained.
3PGM | |||
t-Test: Two-Sample Assuming Unequal Variances | |||
Variable 1 | Variable 2 | ||
Mean | 1.158139535 | 1.1125 | |
Variance | 1.055348837 | 0.68075 | |
Observations | 44 | 56 | |
Hypothesized Mean Difference | 0 | ||
df | 79 | ||
t Stat | 0.238238879 | ||
P(T<=t) one-tail | 0.406156343 | ||
t Critical one-tail | 1.664371409 | ||
P(T<=t) two-tail | 0.812312687 | ||
t Critical two-tail | 1.99045021 | ||
Looking at the p-value row the value obtained is 0.812312687. This value is greater than the alpha level of significance (.05) therefore the null hypothesis is not rejected. This implies that there is no significant difference in the relationship between conference and points per game.
Third Research Question
Is there any correlation between points per game split by conference? To answer this question again, a correlation analysis is conducted and the following table is obtained.
PTS | 3PM | |
PTS | 1 | |
3PM | 0.498609078 | 1 |
Correlation indicates the extent to which two different variables fluctuate with one another and it can be positive or negative. In this case, there is a weak positive correlation (0.4986) between points per game split by conference. This implies that as points per game variable goes up, the split by conference also increases though weakly.
Conclusion
From the analysis conducted above, it is clear that the variables PTS, Conference, and 3PM are independent of each other in NBA. From the correlation analysis performed, it was evident that there is weak correlation relationship between PTS and 3PM which is the points per match and 3-point field goal made per game for instance. It was also discovered that the relationship between conference and points per game and also the relationship between 3 Points field goal per game and the point per game was not significant.