Statistics • Topic 2 of 3

Variance and Standard Deviation

Mean deviation uses absolute values, which are awkward to handle algebraically. Statisticians prefer to square the deviations instead — squaring also removes signs, but produces a smooth, differentiable measure that behaves beautifully under further analysis. This leads to the two most important measures of spread in all of statistics.

Variance $\sigma^2$ is the mean of the squared deviations from the mean. For raw (ungrouped) data of $n$ values:

$$\sigma^2=\dfrac{1}{n}\sum_{i=1}^{n}(x_i-\bar{x})^2$$

Standard deviation $\sigma$ is the positive square root of the variance. The square root brings the measure back to the same units as the original data, which is why it is the quantity actually quoted:

$$\sigma=\sqrt{\dfrac{1}{n}\sum_{i=1}^{n}(x_i-\bar{x})^2}$$

For a frequency distribution (discrete or continuous, with values or class marks $x_i$, frequencies $f_i$, and $N=\sum f_i$):

$$\sigma=\sqrt{\dfrac{1}{N}\sum f_i(x_i-\bar{x})^2}$$

Expanding the square gives an equivalent computational form that avoids first finding $\bar{x}$ and subtracting it from every value — the form the syllabus highlights:

$$\sigma^2=\dfrac{1}{N}\sum f_i x_i^2-\left(\dfrac{\sum f_i x_i}{N}\right)^2$$

This is just $\sigma^2=\overline{x^2}-(\bar{x})^2$ — "the mean of the squares minus the square of the mean" — and it is usually the fastest route by hand because you build two tidy columns ($f_i x_i$ and $f_i x_i^2$) and read off the answer.

When the numbers are large, the step-deviation (short-cut) method rescales the data first. Choose an assumed mean $A$ and a common class width $h$, and set $y_i=\dfrac{x_i-A}{h}$. The $y_i$ are small integers, so the arithmetic is far lighter:

$$\sigma=h\sqrt{\dfrac{1}{N}\sum f_i y_i^2-\left(\dfrac{\sum f_i y_i}{N}\right)^2}$$

The factor $h$ outside the root undoes the scaling, and the assumed mean $A$ never appears in the final answer — a reminder that shifting every value by a constant leaves the standard deviation unchanged, while scaling every value by $k$ multiplies the standard deviation by $|k|$. The table below contrasts the data types and the form you would use:

Data type	$x_i$ means	Standard deviation
Ungrouped	each observation	$\sqrt{\dfrac{1}{n}\sum(x_i-\bar{x})^2}$
Discrete frequency	distinct values	$\sqrt{\dfrac{1}{N}\sum f_i(x_i-\bar{x})^2}$
Continuous frequency	class marks	$\sqrt{\dfrac{1}{N}\sum f_i x_i^2-\bar{x}^2}$

Deeper Insight — why we square instead of taking absolute values: Squaring deviations does the same sign-removing job as the absolute value, but it earns three decisive advantages that make variance the cornerstone of statistics. First, the function $\sum(x_i-a)^2$ is smooth everywhere — it has a derivative at every point — whereas $\sum|x_i-a|$ has sharp corners; this smoothness lets us minimise it cleanly and prove that the minimising value is exactly the mean $\bar{x}$. Second, squaring deliberately gives more weight to large deviations: an observation twice as far from the mean contributes four times as much, so the measure is sensitive to outliers in a controlled, predictable way. Third, variances add for independent quantities, a property absolute deviations simply do not have, and that additivity is what makes the standard deviation the natural scale for the normal distribution and the entire edifice of inferential statistics you will meet later. Taking the square root at the end is not cosmetic — it restores the original units (rupees, kilograms, marks), so a standard deviation is something you can actually interpret on the same axis as the data itself.

Solved Examples

Worked Example

Find the variance and standard deviation of: $2, 4, 6, 8, 10$.

Solution

$n=5$. Mean $\bar{x}=\dfrac{2+4+6+8+10}{5}=\dfrac{30}{5}=6$.
Deviations $(x_i-6)$: $-4, -2, 0, 2, 4$. Squares: $16, 4, 0, 4, 16$.
$\sum(x_i-\bar{x})^2 = 16+4+0+4+16=40$.
Variance $\sigma^2=\dfrac{40}{5}=8$.
Standard deviation $\sigma=\sqrt{8}=2\sqrt{2}\approx 2.83$.

Answer: $\sigma^2=8$, $\sigma=2\sqrt{2}\approx 2.83$.

Worked Example

Find the standard deviation of the first $n$ natural numbers, then evaluate it for $n=10$.

Solution

Using $\sum x_i^2=\dfrac{n(n+1)(2n+1)}{6}$ and $\bar{x}=\dfrac{n+1}{2}$.
$\sigma^2=\dfrac{1}{n}\sum x_i^2-\bar{x}^2=\dfrac{(n+1)(2n+1)}{6}-\dfrac{(n+1)^2}{4}=\dfrac{n^2-1}{12}$.
So $\sigma=\sqrt{\dfrac{n^2-1}{12}}$.
For $n=10$: $\sigma^2=\dfrac{100-1}{12}=\dfrac{99}{12}=8.25$, so $\sigma=\sqrt{8.25}\approx 2.87$.

Answer: $\sigma=\sqrt{\dfrac{n^2-1}{12}}$; for $n=10$, $\sigma\approx 2.87$.

Worked Example

Find the variance of the discrete distribution: $x_i = 4, 8, 11, 17, 20$ with frequencies $f_i = 3, 5, 9, 5, 3$.

Solution

$N=\sum f_i=3+5+9+5+3=25$.
$\sum f_i x_i = 4(3)+8(5)+11(9)+17(5)+20(3)=12+40+99+85+60=296$, so $\bar{x}=\dfrac{296}{25}=11.84$.
$\sum f_i x_i^2 = 16(3)+64(5)+121(9)+289(5)+400(3)=48+320+1089+1445+1200=4102$.
$\sigma^2=\dfrac{\sum f_i x_i^2}{N}-\bar{x}^2=\dfrac{4102}{25}-(11.84)^2=164.08-140.1856=23.8944$.

Answer: Variance $\sigma^2\approx 23.89$ (and $\sigma\approx 4.89$).

Worked Example

Find the standard deviation for the continuous distribution: classes $0\text{-}10, 10\text{-}20, 20\text{-}30, 30\text{-}40, 40\text{-}50$ with frequencies $5, 8, 15, 16, 6$.

Solution

Class marks $x_i$: $5, 15, 25, 35, 45$; $N=50$.
$\sum f_i x_i = 25+120+375+560+270=1350$, so $\bar{x}=\dfrac{1350}{50}=27$.
$\sum f_i x_i^2 = 25(5)+225(8)+625(15)+1225(16)+2025(6)=125+1800+9375+19600+12150=43050$.
$\sigma^2=\dfrac{43050}{50}-27^2=861-729=132$.
$\sigma=\sqrt{132}\approx 11.49$.

Answer: $\sigma^2=132$, $\sigma=\sqrt{132}\approx 11.49$.

Worked Example

Using the step-deviation method, find the standard deviation: classes $0\text{-}10, 10\text{-}20, 20\text{-}30, 30\text{-}40, 40\text{-}50$ with frequencies $5, 8, 15, 16, 6$. (Take $A=25$, $h=10$.)

Solution

Class marks $x_i$: $5, 15, 25, 35, 45$. With $A=25$, $h=10$, $y_i=\dfrac{x_i-25}{10}$ gives $-2, -1, 0, 1, 2$; $N=50$.
$\sum f_i y_i = (-2)(5)+(-1)(8)+0(15)+1(16)+2(6)=-10-8+0+16+12=10$.
$\sum f_i y_i^2 = 4(5)+1(8)+0(15)+1(16)+4(6)=20+8+0+16+24=68$.
$\sigma=h\sqrt{\dfrac{\sum f_i y_i^2}{N}-\left(\dfrac{\sum f_i y_i}{N}\right)^2}=10\sqrt{\dfrac{68}{50}-\left(\dfrac{10}{50}\right)^2}$.
$=10\sqrt{1.36-0.04}=10\sqrt{1.32}\approx 10(1.1489)\approx 11.49$.

Answer: $\sigma\approx 11.49$ — matching the direct method of Example 4.

Worked Example

The mean of $5$ observations is $4.4$ and their variance is $8.24$. If three of the observations are $1, 2, 6$, find the other two.

Solution

Let the unknowns be $a$ and $b$. Mean: $\dfrac{1+2+6+a+b}{5}=4.4\Rightarrow 9+a+b=22\Rightarrow a+b=13$.
Variance: $\dfrac{1}{5}\sum x_i^2-\bar{x}^2=8.24\Rightarrow \dfrac{1}{5}\sum x_i^2=8.24+19.36=27.6$, so $\sum x_i^2=138$.
$1^2+2^2+6^2+a^2+b^2=138\Rightarrow 41+a^2+b^2=138\Rightarrow a^2+b^2=97$.
From $a+b=13$: $a^2+b^2=(a+b)^2-2ab=169-2ab=97\Rightarrow ab=36$.
Solving $a+b=13$, $ab=36$ gives $a=4$, $b=9$ (roots of $t^2-13t+36=0$).

Answer: The other two observations are $4$ and $9$.

Worked Example

Find the variance and standard deviation of the ungrouped data $6, 8, 10, 12, 14, 16, 18, 20, 22, 24$ using the $\overline{x^2}-(\bar{x})^2$ form.

Solution

$n=10$. $\sum x_i=6+8+10+12+14+16+18+20+22+24=150$, so $\bar{x}=15$.
$\sum x_i^2 = 36+64+100+144+196+256+324+400+484+576=2580$.
$\sigma^2=\dfrac{\sum x_i^2}{n}-\bar{x}^2=\dfrac{2580}{10}-15^2=258-225=33$.
$\sigma=\sqrt{33}\approx 5.74$.

Answer: $\sigma^2=33$, $\sigma=\sqrt{33}\approx 5.74$.

Worked Example

Find the standard deviation of the discrete distribution: $x_i = 3, 5, 7, 9, 11$ with frequencies $f_i = 5, 8, 12, 9, 6$, using $\sigma^2=\dfrac{\sum f_i x_i^2}{N}-\bar{x}^2$.

Solution

$N=5+8+12+9+6=40$.
$\sum f_i x_i = 3(5)+5(8)+7(12)+9(9)+11(6)=15+40+84+81+66=286$, so $\bar{x}=\dfrac{286}{40}=7.15$.
$\sum f_i x_i^2 = 9(5)+25(8)+49(12)+81(9)+121(6)=45+200+588+729+726=2288$.
$\sigma^2=\dfrac{2288}{40}-(7.15)^2=57.2-51.1225=6.0775$.
$\sigma=\sqrt{6.0775}\approx 2.47$.

Answer: $\sigma^2\approx 6.08$, $\sigma\approx 2.47$.

Worked Example

The variance of $20$ observations is $5$. If each observation is multiplied by $2$, find the variance of the new data.

Solution

Scaling every value by $k$ multiplies the standard deviation by $|k|$, hence the variance by $k^2$.
Here $k=2$, so the new variance $=2^2\times 5=4\times 5=20$.

Answer: The new variance is $20$. (Adding a constant would have left it unchanged.)

Worked Example

The mean and standard deviation of $100$ observations were found to be $40$ and $5.1$, but later an observation $40$ was wrongly read; in fact there was no observation $40$ at all and $50$ was the correct figure. Find the corrected variance. (Standard NCERT-type figures.)

Solution

From $\bar{x}=40$ over $100$ values, $\sum x_i=100\times 40=4000$.
From $\sigma^2=(5.1)^2=26.01=\dfrac{\sum x_i^2}{100}-40^2$, so $\sum x_i^2=100(26.01+1600)=162601$.
Replace $40$ by $50$: corrected $\sum x_i=4000-40+50=4010$, so corrected $\bar{x}=40.1$.
Corrected $\sum x_i^2=162601-40^2+50^2=162601-1600+2500=163501$.
Corrected $\sigma^2=\dfrac{163501}{100}-(40.1)^2=1635.01-1608.01=27$.

Answer: The corrected variance is $27$ (corrected mean $=40.1$).

Worked Example

Find the variance and standard deviation for the continuous distribution: classes $30\text{-}40, 40\text{-}50, 50\text{-}60, 60\text{-}70, 70\text{-}80$ with frequencies $3, 7, 12, 15, 8$. Use the step-deviation method with $A=55$, $h=10$.

Solution

Class marks $x_i$: $35, 45, 55, 65, 75$; $y_i=\dfrac{x_i-55}{10}$ gives $-2, -1, 0, 1, 2$. $N=3+7+12+15+8=45$.
$\sum f_i y_i = (-2)(3)+(-1)(7)+0(12)+1(15)+2(8)=-6-7+0+15+16=18$.
$\sum f_i y_i^2 = 4(3)+1(7)+0(12)+1(15)+4(8)=12+7+0+15+32=66$.
$\sigma^2=h^2\left[\dfrac{\sum f_i y_i^2}{N}-\left(\dfrac{\sum f_i y_i}{N}\right)^2\right]=100\left[\dfrac{66}{45}-\left(\dfrac{18}{45}\right)^2\right]$.
$=100\,[1.4667-0.16]=100(1.3067)=130.67$, so $\sigma=\sqrt{130.67}\approx 11.43$.

Answer: $\sigma^2\approx 130.67$, $\sigma\approx 11.43$.

Key Points

Variance is the mean squared deviation from the mean: $\sigma^2=\dfrac{1}{n}\sum(x_i-\bar{x})^2$.
Standard deviation $\sigma=\sqrt{\sigma^2}$ restores the original units, so it is the spread we actually quote.
For frequency data, weight by $f_i$ and use the computational form $\sigma^2=\dfrac{1}{N}\sum f_i x_i^2-\left(\dfrac{\sum f_i x_i}{N}\right)^2$ — "mean of squares minus square of the mean".
The step-deviation method $y_i=\dfrac{x_i-A}{h}$ gives $\sigma=h\sqrt{\dfrac{\sum f_iy_i^2}{N}-\left(\dfrac{\sum f_iy_i}{N}\right)^2}$, cutting the arithmetic for large numbers.
Shifting every value by a constant leaves $\sigma$ unchanged; scaling every value by $k$ multiplies $\sigma$ by $|k|$ and the variance by $k^2$.
We square (not absolute-value) deviations because variance is smooth, additive and weights large deviations more heavily.
Variance and standard deviation are never negative, and are zero only when every observation equals the mean.

Quick Check — 4 MCQs

Tap an option to check your answer0 / 4

Q1.The variance is:

Explanation: Mean of squared deviations.

Q2.The standard deviation equals:

Explanation: SD $=\sqrt{\text{variance}}$.

Q3.The variance of $2,4,6$ (mean $4$) is:

Explanation: $\tfrac{4+0+4}{3}=\tfrac83$.

Q4.The standard deviation is always:

Explanation: A square root of a non-negative number.

Practice more MCQs → 📝 Assignment · 35 marks