Home

Contents

Basic Methods

Statistics

Time Series

 

Regression and Correlation

 

No, this doesn't mean reverting to childhood.

--------------------------------------------------------------------------------------

Index of Page Topics

Some Definitions

Statistics

Correlation Coefficient

Linear Regression

Linear Regression

Regression and Correlation

Problem Solving

Autocorrelation

Statistics

Probability

------------------------------------------------

Some Definitions

Our reference to regression is: the relationship between the mean value of a random variable and the corresponding values of one or more independent variables.

This is a standard dictionary definition. Correlation is then what we take to be the measure of the relationship. And the correlation coefficient, so called, gives the quantitative value of the correlation. Auto-correlation, in turn, is self-referencing correlation, or weighted correlation. The definitions themselves depend on what it means for variables to be related -- the subject matter of mathematics.

Variables in a functional relationship can be interpreted to be either dependent or independent, occasioned by practical circumstances -- i.e., by the kind of information available and what has to be calculated from it. If, say, the function relating the variables x and y is

y = 3x + 2

and the natural expectation in the circumstances is to measure x to compute y, then y is properly the dependent variable. The value of y depends on the value of x. Otherwise, if it's more natural to get data on y, then x can become the dependent variable, and we might write

x = (y - 2)/3

to express the dependence of x on y. The value of x depends on y.

In many circumstances it isn't reasonable to expect the different variables in a relationship to be either dependent or independent. The variables themselves may depend on other variables and data may be available on both of them at the same time. The choice then becomes more or less arbitrary, and so it makes more sense to ask how one variable correlates with the other.

If the variables happen to be x and y, and we have a situation involving a population of paired (x, y) values, such as the height and weight of National League hockey players, samples will yield random values. We will also have two regression relationships, namely the regression of height on weight (the mean value of height to the mean value of weight), and the regression of weight on height (the mean value of weight to that of height).

Back to Index

 

----------------------------------------------

The Correlation Coefficient

The correlation coefficient is a measure of the degree of the relationship between two variables and can have any value between +1 and -1. The value is positive if the variables are positively correlated, and negative if they are negatively correlated, which is to say the variables change somewhat together or move somewhat in opposite directions, respectively. The more closely the variables move together, the higher the value of the coefficient. And the more opposite to each other they move, the more negative the coefficient.

For the variables x and y the correlation coefficient, r, may be defined as the covariance of x and y divided by the product of the standard deviations of x and y, or:

r = cxy/sxsy.

Covariance, itself, is the sum of the products of the deviations of the paired x and y values from their respective means, X and Y, divided by (n -1). That is,

cxy = å (x - X)(y - Y)/(n - 1).

In his book, An Introduction to Linear Regression and Correlation, Allen Edwards shows the linear relationship between variables x and y for four of the infinitely many conditions that might exist.

Back to Index

 

----------------------------------------------

Linear Regression

To say that the regression relationship is linear is to say that the regression line is linear. Here we deal only with the linear relationship.

We see here that when the relationship between the variables is deterministic, we can use the defining formula to calculate the dependent variable directly from given empirical values for the independent variable, merely by substituting in the formula, which for the linear relationship has the form

y = a + bx..

When, however, the relationship is statistical, and paired variables are involved, we first have to decide which variable to use as the dependent variable and only then predict its value based on the statistical sample of values. In this case, if the paired values are x and y, we can first choose to let y be the dependent variable. The regression equation for predicting y is then:

y' = a + byx,

where

by = cxy/sx2

and

a = Y - byX,

keeping in mind that X and Y are the means of x and y.

Then with x considered to be the dependent variable, we have the same relationships, but with x and y inverted. 

Back to Index

-----------------------------------------------

Top of Page