Statistics • Topic 2 of 3

Variance and Standard Deviation

Mean deviation uses absolute values, which are awkward to handle algebraically. Statisticians prefer to square the deviations instead — squaring also removes signs, but produces a smooth, differentiable measure that behaves beautifully under further analysis. This leads to the two most important measures of spread.

Variance $\sigma^2$ is the mean of the squared deviations from the mean. For raw (ungrouped) data of $n$ values:

$$\sigma^2=\dfrac{1}{n}\sum_{i=1}^{n}(x_i-\bar{x})^2$$

Standard deviation $\sigma$ is the positive square root of the variance. The square root brings the measure back to the same units as the original data, which is why it is the quantity actually quoted:

$$\sigma=\sqrt{\dfrac{1}{n}\sum_{i=1}^{n}(x_i-\bar{x})^2}$$

For a frequency distribution (discrete or continuous, with class marks $x_i$, frequencies $f_i$, and $N=\sum f_i$):

$$\sigma=\sqrt{\dfrac{1}{N}\sum f_i(x_i-\bar{x})^2}$$

Expanding the square gives an equivalent computational form that avoids first finding $\bar{x}$ and subtracting it from every value:

$$\sigma^2=\dfrac{1}{N}\sum f_i x_i^2-\left(\dfrac{\sum f_i x_i}{N}\right)^2$$

When the numbers are large, the step-deviation (short-cut) method rescales the data first. Choose an assumed mean $A$ and class width $h$, and set $y_i=\dfrac{x_i-A}{h}$. Then:

$$\sigma=h\sqrt{\dfrac{1}{N}\sum f_i y_i^2-\left(\dfrac{\sum f_i y_i}{N}\right)^2}$$

The table below contrasts the data types and the form you would use:

Data type$x_i$ meansStandard deviation
Ungroupedeach observation$\sqrt{\dfrac{1}{n}\sum(x_i-\bar{x})^2}$
Discrete frequencydistinct values$\sqrt{\dfrac{1}{N}\sum f_i(x_i-\bar{x})^2}$
Continuous frequencyclass marks$\sqrt{\dfrac{1}{N}\sum f_i x_i^2-\bar{x}^2}$

Deeper Insight — why we square instead of taking absolute values: Squaring deviations does the same sign-removing job as the absolute value, but it earns three decisive advantages that make variance the cornerstone of all statistics. First, the function $\sum(x_i-a)^2$ is smooth everywhere — it has a derivative at every point — whereas $\sum|x_i-a|$ has sharp corners; this smoothness lets us minimise it cleanly and prove that the minimising value is exactly the mean $\bar{x}$. Second, squaring deliberately gives more weight to large deviations: an observation twice as far from the mean contributes four times as much, so the measure is sensitive to outliers in a controlled, predictable way. Third, variances add for independent quantities, a property absolute deviations simply do not have, and that additivity is what makes the standard deviation the natural scale for the normal distribution, error analysis and the entire edifice of inferential statistics you will meet later. Taking the square root at the end is not cosmetic — it restores the original units (rupees, kilograms, marks), so a standard deviation is something you can actually interpret on the same axis as the data itself.

Two bar distributions with equal means but different standard deviations Equal mean, smaller vs larger SD Small SD (clustered)mean Large SD (spread)mean
1
Worked Example
Find the variance and standard deviation of: $2, 4, 6, 8, 10$.
Solution
  1. $n=5$. Mean $\bar{x}=\dfrac{2+4+6+8+10}{5}=\dfrac{30}{5}=6$.
  2. Deviations $(x_i-6)$: $-4, -2, 0, 2, 4$. Squares: $16, 4, 0, 4, 16$.
  3. $\sum(x_i-\bar{x})^2 = 16+4+0+4+16=40$.
  4. Variance $\sigma^2=\dfrac{40}{5}=8$.
  5. Standard deviation $\sigma=\sqrt{8}=2\sqrt{2}\approx 2.83$.

Answer: $\sigma^2=8$, $\sigma=2\sqrt{2}\approx 2.83$.

2
Worked Example
Find the standard deviation of the first $n$ natural numbers, then evaluate it for $n=10$.
Solution
  1. Using $\sum x_i^2=\dfrac{n(n+1)(2n+1)}{6}$ and $\bar{x}=\dfrac{n+1}{2}$.
  2. $\sigma^2=\dfrac{1}{n}\sum x_i^2-\bar{x}^2=\dfrac{(n+1)(2n+1)}{6}-\dfrac{(n+1)^2}{4}=\dfrac{n^2-1}{12}$.
  3. So $\sigma=\sqrt{\dfrac{n^2-1}{12}}$.
  4. For $n=10$: $\sigma^2=\dfrac{100-1}{12}=\dfrac{99}{12}=8.25$, so $\sigma=\sqrt{8.25}\approx 2.87$.

Answer: $\sigma=\sqrt{\dfrac{n^2-1}{12}}$; for $n=10$, $\sigma\approx 2.87$.

3
Worked Example
Find the variance of the discrete distribution: $x_i = 4, 8, 11, 17, 20$ with frequencies $f_i = 3, 5, 9, 5, 3$.
Solution
  1. $N=\sum f_i=3+5+9+5+3=25$.
  2. $\sum f_i x_i = 4(3)+8(5)+11(9)+17(5)+20(3)=12+40+99+85+60=296$, so $\bar{x}=\dfrac{296}{25}=11.84$.
  3. $\sum f_i x_i^2 = 16(3)+64(5)+121(9)+289(5)+400(3)=48+320+1089+1445+1200=4102$.
  4. $\sigma^2=\dfrac{\sum f_i x_i^2}{N}-\bar{x}^2=\dfrac{4102}{25}-(11.84)^2=164.08-140.1856=23.8944$.

Answer: Variance $\sigma^2\approx 23.89$ (and $\sigma\approx 4.89$).

4
Worked Example
Find the standard deviation for the continuous distribution: classes $0\text{-}10, 10\text{-}20, 20\text{-}30, 30\text{-}40, 40\text{-}50$ with frequencies $5, 8, 15, 16, 6$.
Solution
  1. Class marks $x_i$: $5, 15, 25, 35, 45$; $N=50$.
  2. $\sum f_i x_i = 25+120+375+560+270=1350$, so $\bar{x}=\dfrac{1350}{50}=27$.
  3. $\sum f_i x_i^2 = 25(5)+225(8)+625(15)+1225(16)+2025(6)=125+1800+9375+19600+12150=43050$.
  4. $\sigma^2=\dfrac{43050}{50}-27^2=861-729=132$.
  5. $\sigma=\sqrt{132}\approx 11.49$.

Answer: $\sigma^2=132$, $\sigma=\sqrt{132}\approx 11.49$.

5
Worked Example
Using the step-deviation method, find the standard deviation: classes $0\text{-}10, 10\text{-}20, 20\text{-}30, 30\text{-}40, 40\text{-}50$ with frequencies $5, 8, 15, 16, 6$. (Take $A=25$, $h=10$.)
Solution
  1. Class marks $x_i$: $5, 15, 25, 35, 45$. With $A=25$, $h=10$, $y_i=\dfrac{x_i-25}{10}$ gives $-2, -1, 0, 1, 2$; $N=50$.
  2. $\sum f_i y_i = (-2)(5)+(-1)(8)+0(15)+1(16)+2(6)=-10-8+0+16+12=10$.
  3. $\sum f_i y_i^2 = 4(5)+1(8)+0(15)+1(16)+4(6)=20+8+0+16+24=68$.
  4. $\sigma=h\sqrt{\dfrac{\sum f_i y_i^2}{N}-\left(\dfrac{\sum f_i y_i}{N}\right)^2}=10\sqrt{\dfrac{68}{50}-\left(\dfrac{10}{50}\right)^2}$.
  5. $=10\sqrt{1.36-0.04}=10\sqrt{1.32}\approx 10(1.1489)\approx 11.49$.

Answer: $\sigma\approx 11.49$ — matching the direct method of Example 4.

6
Worked Example
The mean of $5$ observations is $4.4$ and their variance is $8.24$. If three of the observations are $1, 2, 6$, find the other two.
Solution
  1. Let the unknowns be $a$ and $b$. Mean: $\dfrac{1+2+6+a+b}{5}=4.4\Rightarrow 9+a+b=22\Rightarrow a+b=13$.
  2. Variance: $\dfrac{1}{5}\sum x_i^2-\bar{x}^2=8.24\Rightarrow \dfrac{1}{5}\sum x_i^2=8.24+19.36=27.6$, so $\sum x_i^2=138$.
  3. $1^2+2^2+6^2+a^2+b^2=138\Rightarrow 41+a^2+b^2=138\Rightarrow a^2+b^2=97$.
  4. From $a+b=13$: $a^2+b^2=(a+b)^2-2ab=169-2ab=97\Rightarrow ab=36$.
  5. Solving $a+b=13$, $ab=36$ gives $a=4$, $b=9$ (roots of $t^2-13t+36=0$).

Answer: The other two observations are $4$ and $9$.

Key Points

  • Variance is the mean squared deviation from the mean: $\sigma^2=\dfrac{1}{n}\sum(x_i-\bar{x})^2$.
  • Standard deviation $\sigma=\sqrt{\sigma^2}$ restores the original units, so it is the spread we actually quote.
  • For frequency data, weight by $f_i$ and use $\sigma^2=\dfrac{1}{N}\sum f_i x_i^2-\bar{x}^2$ as a fast computational form.
  • The step-deviation method $y_i=\dfrac{x_i-A}{h}$ gives $\sigma=h\sqrt{\dfrac{\sum f_iy_i^2}{N}-\left(\dfrac{\sum f_iy_i}{N}\right)^2}$.
  • We square (not absolute-value) deviations because variance is smooth, additive and weights large deviations more heavily.
Tap an option to check your answer0 / 4
Q1.The variance is:
Explanation: Mean of squared deviations.
Q2.The standard deviation equals:
Explanation: SD $=\sqrt{\text{variance}}$.
Q3.The variance of $2,4,6$ (mean $4$) is:
Explanation: $\tfrac{4+0+4}{3}=\tfrac83$.
Q4.The standard deviation is always:
Explanation: A square root of a non-negative number.