Linear regression alludes to a statistical methodology that examines the link and connection between independent variables, designated as x, and a dependent variable, which is designated as y (Brid, 2018). The dependent variable ought to be continuous meaning that it should assume any value, or at least, have some form continuity. The independent variables have flexibility in form. Even though linear regression cannot depict causation in itself, the independent variable typically affects the dependent variable.
In its nature, linear regression only entails the relationship between the independent and dependent variables i.e. there is a direct or straight-line relationship between them. However, this can be incorrect sometimes. For instance, the relationship between age and income is curved; that is, the income typically rises in the early stages of adulthood, flatten later in adulthood and decrease as individuals retire. One can tell if this an issue by looking at their relationships using graphical representation.
Delegate your assignment to our experts and they will do the rest.
Linear regression represents the relationship between the aggregate of the independent and dependent variables. For instance, if one looks that relationship between birth weight and the maternal features of a child, such as age, linear regression obtains the aggregate weight of children born to mothers of diverse ages. Nevertheless, sometimes, one needs to perceive the limits of the dependent variables.
Linear regression, not a holistic representation of connections between variables, just as the mean is not a comprehensive illustration of a variable. This issue can be mitigated by utilizing quantile regression.
1. Linear Regression is Sensitive to Outliers
Outliers are queer types of data. They are typically multivariate or univariate (founded on one variable). If one is looking at income and age, univariate outliers are things like a person that is 118 years of age, or one who made $15 million the previous year. A multivariate outlier would be an individual who is 18 years old and made $15 million the past year. In this situation, neither income nor age is very extreme, but very few individuals, especially 18-year-olds, make that much. Outliers typically have significant impacts on the regression. One can mitigate this problem by utilizing influence statistics from statistical software.
2. Data must be independent
Linear regression makes the assumption that the data is independent. This means that a subject’s score has no relationship to another. This often, but not always, sensible. Two cases where it does not make sense are clustering in time and space.
One classic instance of clustering in space is the test scores of students when one has students from different classes, grades, schools, and school districts. Students that are in the same class have many similarities i.e., they are usually from the same neighborhood and have similar tutors, etc. Therefore, they are not independent.
Instances of clustering in time are any studies that measure similar subjects numerous times. For example, in an investigation of weight and diet, one might measure each individual several times. These types of data are not independent since what a person weighs on one situation is related to what she or he weighs in another. One way to mitigate this is to utilize multilevel models.
References
Brid, S. Rajesh. (2018). Introduction to Machine Learning and Regression? Medium . Retrieved 24 August 2019, from https://medium.com/greyatom/linear-regression-is-a-line-reasonable-b1c8f94d03d7