As these two terms looks similar Correlation and linear regression but actually they are not. They both defines the relationship between two variables. Lets see how they are different:
- Linear regression finds the model (best fitted line) that can best predict the value of Y using value of X. Measure for fit for regression model is the coefficient of determination or (r sqr). It is the proportion of variability of the dependent variable (Y) explained by the independent variable (X). On the contrary measure for correlation is correlation coefficient (r) that tells how much one variable tends to change when the other changes. Value of r lies between -1 and +1. When r is 0.0, there is no relationship. When r > 0, there is a trend that one variable goes up as the other one goes up. When r < 0, there is a trend that one variable goes up as the other one goes down.
- With correlation, we don’t have to think about cause and effect. It doesn’t matter whether we are calculating the correlation between (x and y) or (y and x). In both the case value of r will come out to be same. But in case of linear regression, the decision of which variable you call “X” and which you call “Y” matters a lot. As Y is response variable which we want to predict and X is explanatory varible using X we want to predict Y. If we swap these, value of r sqr will change.
- Also, in case of linear regression, there are two parts : first part is variability in response variable that can be explained by variation in explanatory variable (which is r sqr) and the other part remains unexplained. But in correlation there is no concept of “unexplained part”.