Personal tools

Measures of Correlation

Correlation Coefficients_031124A
[Correlation Coefficients (Wikipedia) - Several sets of (x, y) points, with the Pearson correlation coefficient of x and y for each set. The correlation reflects the noisiness and direction of a linear relationship (top row), but not the slope of that relationship (middle), nor many aspects of nonlinear relationships (bottom). N.B.: the figure in the center has a slope of 0 but in that case, the correlation coefficient is undefined because the variance of Y is zero.]


- Overview

The most common techniques for studying the relationship between two quantitative variables are correlation and linear regression. Correlation quantifies the strength of the linear relationship between a pair of variables, whereas regression expresses this relationship in the form of an equation. 

For example, for patients attending an accident and emergency department (A&E), we can use correlation and regression to determine whether a relationship exists between age and urea levels, and whether urea levels can be predicted at a given age. 

In statistics, correlation is a statistical relationship between two random variables or bivariate data. It can be causal or not. 

Here are some types of correlation: 

  • Positive correlation: When two variables move in the same direction. For example, when one variable increases, the other also increases.
  • Negative correlation: When two variables consistently move in opposite directions. For example, when one variable increases, the other decreases.
  • No correlation: When there is no relationship between the values of two variables. The correlation coefficient is 0

 

- Interpreting A Correlation Coefficient

A correlation coefficient is a statistical measure of the degree to which changes to the value of one variable predict change to the value of another.

The correlation coefficient always has a value between 1 and -1, and you think of it as a general indicator of the strength of the relationship between variables. 

The sign of the correlation coefficient is positive for positive relationships and negative for negative relationships. A positive correlation indicates that increases in the value of one score tend to be accompanied by increases in the other. A negative correlation indicates that increases in one are accompanied by decreases in the other. 

The sign of the coefficient reflects whether the variables change in the same direction or in opposite directions: a positive value indicates that the variables change in the same direction, and a negative value indicates that the variables change in the same direction.

The absolute value of a number is equal to the unsigned number. The absolute value of the correlation coefficient tells you the size of the correlation: the larger the absolute value, the stronger the correlation.

There are many different guidelines for interpreting correlation coefficients because results can vary significantly between research areas. You can use the above table as a general guide for interpreting the strength of a correlation based on its correlation coefficient value.

 

- The Steps To Calculate a Correlation Coefficient

Correlation analysis estimates the direction and strength of the linear association between two variables. The sign of the correlation coefficient indicates the direction of the association. 

To calculate a correlation coefficient, you can follow these steps:

  1. Determine your data sets
  2. Calculate the standardized value for your x variables
  3. Calculate the standardized value for your y variables
  4. Add up your x variables and your y variables
  5. Multiply the corresponding x and y values and add them together
  6. Square each x variable and y variable and add them together
  7. Divide the sum and determine the correlation coefficient

 

Paris_France_011321A
[Paris, France - Civil Engineering Discoveries]

- Correlation in Machine Learning

Here are some measures of correlation in machine learning:

  • Correlation coefficient: Measures how well two variables are related. It ranges from -1 to +1 and is denoted by the letter r.
  • Rank correlation coefficient: Measures the degree of similarity between two variables. It can also be used to assess the significance of the relation between them.
  • Spearman's Rank Correlation: A statistical measure of the strength and direction of the monotonic relationship between two continuous variables.
  • Kendall rank correlation coefficient: Used to estimate a rank-based measure of association. This test may be used if the data do not necessarily come from a bivariate normal distribution.
  • Coefficient of determination: The proportion of the variance in the dependent variable that is predictable from the independent variable(s). It ranges between 0 for no correlation and +1 for complete correlation.

 

- The Formula for Calculating the Correlation Coefficient

The correlation coefficient is a normalized measurement of how two variables in a data set are linearly related. It is measured on a scale that varies from + 1 through 0 to – 1. 

Complete correlation between two variables is expressed by either + 1 or -1. 

The formula for calculating the correlation coefficient is:


 ρ(X,Y) = cov(X,Y) / σX.σY
 
In this formula, cov is the covariance, σX is the standard deviation of X, and σY is the standard deviation of Y. 

The correlation coefficient can also be calculated using the formula:


Correlation(r) = NΣXY - (ΣX)(ΣY) / Sqrt([NΣX2 - (ΣX)2][NΣY2 - (ΣY)2])

In this formula, N is the number of values or elements, X and Y are the first and second scores, ΣXY is the sum of the product of the first and second scores, ΣX is the sum of the first scores, and ΣY is the sum of the second scores. 

 

- The Limitations of Correlation

Correlation analysis has several limitations, including:
  • Causation: Correlation analysis can't be used to determine cause and effect relationships. It can only support the possibility of a relationship.
  • Third variables: Correlation analysis doesn't account for other variables that may be affecting the variables being studied.
  • Nonlinearity: Correlation analysis can't accurately describe curvilinear relationships.
  • Time: Correlations can change over time, and correlation analysis doesn't account for correlation trends over time.
  • External factors: Correlation analysis doesn't account for external factors that may influence the variables being studied.
  • Risk concentration: Relying solely on correlation for diversification can lead to a concentration of risk.

 

 

[More to come ...]



Document Actions