Correlation analysis is a statistical technique used to evaluate the strengths of relationships, between variables. Strong correlation implies a strong relationship between variables while a weak correlation implies that the variables are barely related. In correlation analysis, the correlation coefficient r ranges from -1 to 1, where a positive value indicates that the variables have a positive relationship, where an increase in one variable leads to a rise in another ( Gerstman, 2015) . On the other hand, a negative value shows a negative relationship where a decrease in one variable leads to a reduction of another. A zero value means that there is no correlation between the two variables. Correlation is different from causation in the sense that while association evaluates the relationship between variables causation evaluates the causal effect of a particular event or action ( BUSPH, 2013) .
To evaluate for significance of a correlation coefficient, it is important to do a t or z test which will be determined by the number of data pairs being evaluated. In this case a t test is carried out if the data is less that 30 and if the data is greater than 30 then we carry out a z test. As noted, correlation analysis describes using numerals the intensity of the relationship between two variables. To make sure that you know the intensity of the relationship it is important to carry out a correlation analysis which will assume values between -1 and 1. Consider the illustration below:
Delegate your assignment to our experts and they will do the rest.
The value close to one implies that there is a higher correlation between the two variables that are being evaluated while a value approaching zero means that there is no correlation between the two variables. As noted to test for significance or to know that the correlation evaluated is real, we perform a t test or a z test where we apply the following hypothesis:
H0: The data have a zero correlation
Ha: the data has a nonzero correlation
Therefore using the formula of t (when the data set is less than thirty) and the formula of z (when the dataset is greater than 30) we determine whether the relationship is significant or not. The formulas below represent the formula for t and z respectively:
When the values of t and z are obtained then they are compared with the tabulated values of t and z and then give accurate conclusions. For instance, consider a study evaluating the relationship between the win percentage in a football match and the runs scored. The following data was obtained:
Win Percentage | Runs Scored |
0.292 0.395 0.395 0.534 0.46 0.556 0.574 0.543 0.586 0.5 0.528 0.457 0.543 0.586 0.438 0.444 0.265 0.342 0.407 0.487 0.429 0.401 0.488 0.327 0.417 0.461 0.525 0.463 0.519 0.488 0.364 0.543 0.605 0.537 0.522 0.642 |
582 630 735 750 689 757 796 726 787 751 743 821 887 822 723 827 591 575 724 823 747 722 784 783 654 652 899 791 817 750 617 703 896 798 729 829 |
When a correlation analysis of the data is done, the data has a correlation coefficient of r=0.7, which indicates a relatively strong relationship. Therefore, to evaluate the significance of the results, we carry out a z statistic since the data set is greater than 30.
Comparing this value with that of the tables we conclude that the two variables have a non-zero correlation.
References
Boston University School of Public Health, (BUSPH). (2013). Introduction to correlation and regression analysis. In Multivariable Methods [Online learning module]. http://sphweb.bumc.bu.edu/otlt/MPHModules/BS/BS704_Multivariable/BS704_Multivariable5.html
Gerstman, B. B. (2015). Basic biostatistics: Statistics for public health practice (2nd ed.). Burlington, MA