Correlation
Meaning, Types and the Scatter Diagram
In economics, two quantities often move together — for example, as the price of a good rises, its demand falls; as income rises, saving rises. Correlation measures the relationship between two variables: whether they move together, and how closely. It does not prove that one causes the other; it only shows that they are related.
Correlation can be of these types:
- Positive correlation — the two variables move in the same direction (both rise together or both fall together), e.g. income and consumption.
- Negative correlation — the two move in opposite directions (one rises as the other falls), e.g. price and demand.
- Zero (no) correlation — no clear relationship.
The simplest way to see correlation is a scatter diagram — we plot each pair of values as a dot on a graph (one variable on each axis). The pattern of dots reveals the relationship: dots rising from left to right show positive correlation; dots falling from left to right show negative correlation; scattered dots with no pattern show no correlation. The more closely the dots cluster around a straight line, the stronger the correlation.
They move oppositely.
- As price rises, demand usually falls.
- Opposite directions → negative correlation.
Look at the slope of the dots.
- Dots rising from the lower-left to the upper-right show positive correlation.
Correlation is not causation.
- No. Correlation only shows that two variables are related/move together.
- It does not prove cause and effect.
Key Points
- Correlation measures the relationship between two variables (not causation).
- Types: positive (same direction), negative (opposite), zero (no relation).
- Scatter diagram: dots rising → positive; falling → negative; no pattern → none. Closer to a line = stronger.
Karl Pearson's Coefficient of Correlation
A scatter diagram shows the direction of correlation but not its exact strength. For an exact numerical measure we use Karl Pearson's coefficient of correlation (r). Using deviations from the mean (dx = X − X̄, dy = Y − Ȳ), the formula is:
r = Σ(dx·dy) ÷ √(Σdx² × Σdy²)
The value of r always lies between −1 and +1:
- r = +1 → perfect positive correlation.
- r = −1 → perfect negative correlation.
- r = 0 → no correlation.
- Values near ±1 mean strong correlation; values near 0 mean weak correlation.
Worked example. Find r for X: 10, 20, 30, 40, 50 and Y: 20, 40, 60, 80, 100.
- Means: X̄ = 150÷5 = 30; Ȳ = 300÷5 = 60.
- dx: −20, −10, 0, 10, 20. dy: −40, −20, 0, 20, 40.
- Σ(dx·dy) = 800 + 200 + 0 + 200 + 800 = 2000.
- Σdx² = 400 + 100 + 0 + 100 + 400 = 1000; Σdy² = 1600 + 400 + 0 + 400 + 1600 = 4000.
- r = 2000 ÷ √(1000 × 4000) = 2000 ÷ √4000000 = 2000 ÷ 2000 = +1.
So X and Y have perfect positive correlation — here Y is always exactly twice X. Pearson's r is the most widely used measure, but it assumes a linear relationship and (unlike the median) is affected by extreme values.
r is bounded.
- r lies between −1 and +1.
- r = −1 means perfect negative correlation.
Apply Pearson's formula.
- r = 120 ÷ √(100 × 144) = 120 ÷ √14400.
- = 120 ÷ 120 = 1.
Compare with the limits.
- 0.15 is positive but close to 0.
- So it is a weak positive correlation.
Key Points
- Karl Pearson's r = Σ(dx·dy) ÷ √(Σdx² × Σdy²); dx = X−X̄, dy = Y−Ȳ.
- Always between −1 and +1: +1 perfect positive, −1 perfect negative, 0 none.
- Assumes a linear relation; affected by extreme values.
Spearman's Rank Correlation and Interpretation
Sometimes data cannot be measured in exact numbers but can be ranked — for example, the ranking of contestants by two judges, or items rated by beauty, honesty or efficiency. For such qualitative or ranked data we use Spearman's rank correlation coefficient (R):
R = 1 − [ 6ΣD² ÷ N(N² − 1) ]
where D is the difference between the two ranks of each item and N is the number of items. Like Pearson's r, R also lies between −1 and +1.
Worked example. Two judges rank 5 paintings as follows — Judge A: 1, 2, 3, 4, 5 and Judge B: 2, 1, 4, 3, 5. Find the rank correlation.
- Differences D = A − B: −1, 1, −1, 1, 0.
- D²: 1, 1, 1, 1, 0; ΣD² = 4. N = 5, so N(N² − 1) = 5 × (25 − 1) = 5 × 24 = 120.
- R = 1 − (6 × 4) ÷ 120 = 1 − 24÷120 = 1 − 0.2 = +0.8.
So the two judges' rankings have a high positive agreement (R = +0.8). Interpreting a correlation coefficient: a value close to +1 means a strong direct relationship, close to −1 a strong inverse relationship, and close to 0 a weak or no relationship. But always remember the golden caution: correlation is not causation — a high correlation between two things (like ice-cream sales and drowning) may be due to a third common factor (summer heat), not because one causes the other.
Think about the type of data.
- When data are qualitative or given as ranks (e.g. ranked by judges, beauty, honesty).
Apply the rank-correlation formula.
- N(N²−1) = 5 × 24 = 120.
- R = 1 − (6 × 10)÷120 = 1 − 60÷120 = 1 − 0.5 = 0.5.
Watch for a hidden third factor.
- No. Both are caused by a third factor — hot weather.
- Correlation is not causation.
Key Points
- Spearman's R = 1 − [6ΣD² ÷ N(N²−1)]; used for ranked / qualitative data; lies between −1 and +1.
- D = difference of the two ranks; N = number of items.
- Interpret: near +1 strong direct, near −1 strong inverse, near 0 weak/none.
- Golden rule: correlation is not causation (beware a hidden third factor).