Statistics
Measures of Central Tendency
The mean is the average, the median the middle value, and the mode the most frequent. The empirical relation is Mode = 3 Median − 2 Mean.
- Mean, median and mode summarise the centre.
- Mode = 3 Median − 2 Mean (empirical).
Range and Mean Deviation
In Class 10 you summarised data with a single central value — mean, median or mode. But two classes can share the same average and still be wildly different: one steady, the other scattered. Measures of dispersion answer the question the average ignores — how spread out are the values?
The crudest measure is the range, the gap between the largest and smallest observation:
Range is quick but fragile — it depends only on the two extreme values and ignores everything in between, so a single outlier can blow it up. A more honest measure looks at how far every value sits from the centre. That is the idea behind mean deviation: average the distances of all observations from a central value $a$ (the mean or the median). Distances are taken as absolute values so that values below and above the centre do not cancel.
Mean deviation for ungrouped (raw) data about a central value $a$:
Taking $a=\bar{x}$ gives mean deviation about the mean; taking $a=M$ (the median) gives mean deviation about the median.
For grouped data with values (or class marks) $x_i$ and frequencies $f_i$, where $N=\sum f_i$:
For a continuous frequency distribution, $x_i$ is the class mark (midpoint) of each class. The summary below collects the dispersion measures of this chapter:
| Measure | What it captures | Uses every value? |
|---|---|---|
| Range | Total spread (extremes only) | No |
| Mean deviation | Average distance from centre | Yes |
| Variance / S.D. | Average squared distance from mean | Yes |
A useful fact worth remembering: among all choices of the central value $a$, the median minimises the sum of absolute deviations, so M.D. about the median is never larger than M.D. about the mean.
Deeper Insight — why absolute values, and why median is special: The whole point of a deviation is to record distance, and distance has no sign — a value $4$ below the mean is just as "far away" as a value $4$ above it. If we summed the raw signed deviations $\sum(x_i-\bar{x})$ we would always get exactly zero, because the mean is the precise balance point where positive and negative deviations cancel. That zero tells us nothing about spread, so we strip the signs with $|\cdot|$ before averaging. The absolute value also explains a subtler property: the function $\sum|x_i-a|$ is smallest when $a$ is the median, not the mean. Intuitively, the median is the point with as many observations on its left as on its right, so nudging $a$ in either direction increases as many distances as it decreases. This is exactly why mean deviation about the median is the natural "minimum total distance" measure, and it is the reason the NCERT singles it out.
- Identify the largest value: maximum $= 54$.
- Identify the smallest value: minimum $= 23$.
- Range $= 54 - 23 = 31$.
Answer: Range $= 31$.
- There are $n=8$ values. Mean $\bar{x}=\dfrac{6+7+10+12+13+4+8+12}{8}=\dfrac{72}{8}=9$.
- Absolute deviations $|x_i-9|$: $3, 2, 1, 3, 4, 5, 1, 3$.
- Sum of absolute deviations $=3+2+1+3+4+5+1+3=22$.
- $\text{M.D.}(\bar{x})=\dfrac{1}{8}\times 22=2.75$.
Answer: M.D. about the mean $= 2.75$.
- Arrange in order: $3, 3, 4, 5, 7, 9, 10, 12, 18, 19, 21$. Here $n=11$ (odd).
- Median $=$ the $\left(\dfrac{11+1}{2}\right)$th $=6$th value $=9$.
- Absolute deviations $|x_i-9|$: $6, 6, 5, 4, 2, 0, 1, 3, 9, 10, 12$.
- Sum $=6+6+5+4+2+0+1+3+9+10+12=58$.
- $\text{M.D.}(M)=\dfrac{1}{11}\times 58=\dfrac{58}{11}\approx 5.27$.
Answer: M.D. about the median $=\dfrac{58}{11}\approx 5.27$.
- $N=\sum f_i=7+4+6+3+5=25$.
- $\sum f_i x_i = 5(7)+10(4)+15(6)+20(3)+25(5)=35+40+90+60+125=350$.
- Mean $\bar{x}=\dfrac{350}{25}=14$.
- Absolute deviations $|x_i-14|$: $9, 4, 1, 6, 11$.
- $\sum f_i|x_i-14| = 9(7)+4(4)+1(6)+6(3)+11(5)=63+16+6+18+55=158$.
- $\text{M.D.}(\bar{x})=\dfrac{158}{25}=6.32$.
Answer: M.D. about the mean $= 6.32$.
- Class marks $x_i$: $5, 15, 25, 35, 45$. Total $N=5+8+15+16+6=50$.
- $\sum f_i x_i = 5(5)+15(8)+25(15)+35(16)+45(6)=25+120+375+560+270=1350$.
- Mean $\bar{x}=\dfrac{1350}{50}=27$.
- Absolute deviations $|x_i-27|$: $22, 12, 2, 8, 18$.
- $\sum f_i|x_i-27| = 22(5)+12(8)+2(15)+8(16)+18(6)=110+96+30+128+108=472$.
- $\text{M.D.}(\bar{x})=\dfrac{472}{50}=9.44$.
Answer: M.D. about the mean $= 9.44$.
- The data is already ordered and $n=8$ (even).
- Median $=$ mean of the $4$th and $5$th values $=\dfrac{9+10}{2}=9.5$.
- Absolute deviations $|x_i-9.5|$: $5.5, 2.5, 1.5, 0.5, 0.5, 2.5, 3.5, 7.5$.
- Sum $=5.5+2.5+1.5+0.5+0.5+2.5+3.5+7.5=24$.
- $\text{M.D.}(M)=\dfrac{1}{8}\times 24=3$.
Answer: M.D. about the median $= 3$.
- Range $=$ maximum $-$ minimum; quick but uses only the two extreme values.
- Mean deviation averages the absolute distances of all values from a centre: $\text{M.D.}(a)=\dfrac{1}{n}\sum|x_i-a|$.
- For grouped data, weight by frequency: $\text{M.D.}(a)=\dfrac{1}{N}\sum f_i|x_i-a|$, using class marks for continuous data.
- Signed deviations from the mean always sum to zero, which is why we take absolute values.
- M.D. about the median is the smallest possible, since the median minimises total absolute distance.
Variance and Standard Deviation
Mean deviation uses absolute values, which are awkward to handle algebraically. Statisticians prefer to square the deviations instead — squaring also removes signs, but produces a smooth, differentiable measure that behaves beautifully under further analysis. This leads to the two most important measures of spread.
Variance $\sigma^2$ is the mean of the squared deviations from the mean. For raw (ungrouped) data of $n$ values:
Standard deviation $\sigma$ is the positive square root of the variance. The square root brings the measure back to the same units as the original data, which is why it is the quantity actually quoted:
For a frequency distribution (discrete or continuous, with class marks $x_i$, frequencies $f_i$, and $N=\sum f_i$):
Expanding the square gives an equivalent computational form that avoids first finding $\bar{x}$ and subtracting it from every value:
When the numbers are large, the step-deviation (short-cut) method rescales the data first. Choose an assumed mean $A$ and class width $h$, and set $y_i=\dfrac{x_i-A}{h}$. Then:
The table below contrasts the data types and the form you would use:
| Data type | $x_i$ means | Standard deviation |
|---|---|---|
| Ungrouped | each observation | $\sqrt{\dfrac{1}{n}\sum(x_i-\bar{x})^2}$ |
| Discrete frequency | distinct values | $\sqrt{\dfrac{1}{N}\sum f_i(x_i-\bar{x})^2}$ |
| Continuous frequency | class marks | $\sqrt{\dfrac{1}{N}\sum f_i x_i^2-\bar{x}^2}$ |
Deeper Insight — why we square instead of taking absolute values: Squaring deviations does the same sign-removing job as the absolute value, but it earns three decisive advantages that make variance the cornerstone of all statistics. First, the function $\sum(x_i-a)^2$ is smooth everywhere — it has a derivative at every point — whereas $\sum|x_i-a|$ has sharp corners; this smoothness lets us minimise it cleanly and prove that the minimising value is exactly the mean $\bar{x}$. Second, squaring deliberately gives more weight to large deviations: an observation twice as far from the mean contributes four times as much, so the measure is sensitive to outliers in a controlled, predictable way. Third, variances add for independent quantities, a property absolute deviations simply do not have, and that additivity is what makes the standard deviation the natural scale for the normal distribution, error analysis and the entire edifice of inferential statistics you will meet later. Taking the square root at the end is not cosmetic — it restores the original units (rupees, kilograms, marks), so a standard deviation is something you can actually interpret on the same axis as the data itself.
- $n=5$. Mean $\bar{x}=\dfrac{2+4+6+8+10}{5}=\dfrac{30}{5}=6$.
- Deviations $(x_i-6)$: $-4, -2, 0, 2, 4$. Squares: $16, 4, 0, 4, 16$.
- $\sum(x_i-\bar{x})^2 = 16+4+0+4+16=40$.
- Variance $\sigma^2=\dfrac{40}{5}=8$.
- Standard deviation $\sigma=\sqrt{8}=2\sqrt{2}\approx 2.83$.
Answer: $\sigma^2=8$, $\sigma=2\sqrt{2}\approx 2.83$.
- Using $\sum x_i^2=\dfrac{n(n+1)(2n+1)}{6}$ and $\bar{x}=\dfrac{n+1}{2}$.
- $\sigma^2=\dfrac{1}{n}\sum x_i^2-\bar{x}^2=\dfrac{(n+1)(2n+1)}{6}-\dfrac{(n+1)^2}{4}=\dfrac{n^2-1}{12}$.
- So $\sigma=\sqrt{\dfrac{n^2-1}{12}}$.
- For $n=10$: $\sigma^2=\dfrac{100-1}{12}=\dfrac{99}{12}=8.25$, so $\sigma=\sqrt{8.25}\approx 2.87$.
Answer: $\sigma=\sqrt{\dfrac{n^2-1}{12}}$; for $n=10$, $\sigma\approx 2.87$.
- $N=\sum f_i=3+5+9+5+3=25$.
- $\sum f_i x_i = 4(3)+8(5)+11(9)+17(5)+20(3)=12+40+99+85+60=296$, so $\bar{x}=\dfrac{296}{25}=11.84$.
- $\sum f_i x_i^2 = 16(3)+64(5)+121(9)+289(5)+400(3)=48+320+1089+1445+1200=4102$.
- $\sigma^2=\dfrac{\sum f_i x_i^2}{N}-\bar{x}^2=\dfrac{4102}{25}-(11.84)^2=164.08-140.1856=23.8944$.
Answer: Variance $\sigma^2\approx 23.89$ (and $\sigma\approx 4.89$).
- Class marks $x_i$: $5, 15, 25, 35, 45$; $N=50$.
- $\sum f_i x_i = 25+120+375+560+270=1350$, so $\bar{x}=\dfrac{1350}{50}=27$.
- $\sum f_i x_i^2 = 25(5)+225(8)+625(15)+1225(16)+2025(6)=125+1800+9375+19600+12150=43050$.
- $\sigma^2=\dfrac{43050}{50}-27^2=861-729=132$.
- $\sigma=\sqrt{132}\approx 11.49$.
Answer: $\sigma^2=132$, $\sigma=\sqrt{132}\approx 11.49$.
- Class marks $x_i$: $5, 15, 25, 35, 45$. With $A=25$, $h=10$, $y_i=\dfrac{x_i-25}{10}$ gives $-2, -1, 0, 1, 2$; $N=50$.
- $\sum f_i y_i = (-2)(5)+(-1)(8)+0(15)+1(16)+2(6)=-10-8+0+16+12=10$.
- $\sum f_i y_i^2 = 4(5)+1(8)+0(15)+1(16)+4(6)=20+8+0+16+24=68$.
- $\sigma=h\sqrt{\dfrac{\sum f_i y_i^2}{N}-\left(\dfrac{\sum f_i y_i}{N}\right)^2}=10\sqrt{\dfrac{68}{50}-\left(\dfrac{10}{50}\right)^2}$.
- $=10\sqrt{1.36-0.04}=10\sqrt{1.32}\approx 10(1.1489)\approx 11.49$.
Answer: $\sigma\approx 11.49$ — matching the direct method of Example 4.
- Let the unknowns be $a$ and $b$. Mean: $\dfrac{1+2+6+a+b}{5}=4.4\Rightarrow 9+a+b=22\Rightarrow a+b=13$.
- Variance: $\dfrac{1}{5}\sum x_i^2-\bar{x}^2=8.24\Rightarrow \dfrac{1}{5}\sum x_i^2=8.24+19.36=27.6$, so $\sum x_i^2=138$.
- $1^2+2^2+6^2+a^2+b^2=138\Rightarrow 41+a^2+b^2=138\Rightarrow a^2+b^2=97$.
- From $a+b=13$: $a^2+b^2=(a+b)^2-2ab=169-2ab=97\Rightarrow ab=36$.
- Solving $a+b=13$, $ab=36$ gives $a=4$, $b=9$ (roots of $t^2-13t+36=0$).
Answer: The other two observations are $4$ and $9$.
- Variance is the mean squared deviation from the mean: $\sigma^2=\dfrac{1}{n}\sum(x_i-\bar{x})^2$.
- Standard deviation $\sigma=\sqrt{\sigma^2}$ restores the original units, so it is the spread we actually quote.
- For frequency data, weight by $f_i$ and use $\sigma^2=\dfrac{1}{N}\sum f_i x_i^2-\bar{x}^2$ as a fast computational form.
- The step-deviation method $y_i=\dfrac{x_i-A}{h}$ gives $\sigma=h\sqrt{\dfrac{\sum f_iy_i^2}{N}-\left(\dfrac{\sum f_iy_i}{N}\right)^2}$.
- We square (not absolute-value) deviations because variance is smooth, additive and weights large deviations more heavily.