Statistics is what separates a data analyst from a spreadsheet user. It lets you say not just “sales went up” but “sales went up, and we are 95% confident it is real, not luck.” This module gives you the core statistical toolkit to make claims you can defend.
1Probability & key distributions
A distribution describes how likely each value of a variable is. Three show up constantly in analytics.
| Distribution | Models… | Example |
|---|---|---|
| Normal (bell) | natural measurements that cluster around a mean | heights, exam scores, measurement error |
| Binomial | number of successes in N yes/no trials | conversions out of 1,000 visitors |
| Poisson | count of events in a fixed window | support tickets per hour |
The Normal distribution follows the 68–95–99.7 rule: about 68% of values fall within 1 standard deviation of the mean, 95% within 2, and 99.7% within 3.
from scipy import stats
# In a standard Normal: P(value < 1.5)
print('P(Z < 1.5) =', round(stats.norm.cdf(1.5), 4))
# The 95th percentile (one-sided z critical value)
print('95th percentile z =', round(stats.norm.ppf(0.95), 3))P(Z < 1.5) = 0.9332 95th percentile z = 1.645
- Normal models clustered measurements; Binomial counts successes in trials; Poisson counts events per window.
- The 68–95–99.7 rule: ±1σ, ±2σ, ±3σ around the mean cover those shares of a normal distribution.
scipy.statsgives probabilities (cdf) and percentiles (ppf) for any distribution.
2Sampling & the Central Limit Theorem
You almost never measure a whole population — you study a sample and infer. The risk is sampling error: a different sample would give a slightly different answer.
Standard error: how much a sample mean wobbles
import numpy as np
sample = np.array([1200, 980, 1450, 760, 2100, 1180, 1320, 990])
mean = sample.mean()
std = sample.std(ddof=1) # sample standard deviation
se = std / np.sqrt(len(sample)) # standard error of the mean
print('Mean:', round(mean, 1), ' Standard error:', round(se, 1))Mean: 1247.5 Standard error: 150.2
- We infer about a population from a sample; sampling error is the unavoidable wobble.
- Standard error = sample std ÷ √n — it shrinks as the sample grows.
- The CLT makes sample means approximately Normal, underpinning most statistical tests.
3Hypothesis testing: the framework
A hypothesis test answers: “Could this result be just chance?” The logic is always the same.
- State a null hypothesis H₀ (no effect / no difference) and an alternative H₁.
- Pick a significance level α (usually 0.05).
- Compute a test statistic and its p-value.
- If p < α, reject H₀ — the effect is “statistically significant”.
from scipy import stats
# Did region A and region B have different average order values?
region_a = [1200, 1450, 1320, 1180, 1390, 1275]
region_b = [ 980, 1020, 940, 1100, 995, 1010]
t_stat, p_value = stats.ttest_ind(region_a, region_b)
print('t =', round(t_stat, 2), ' p =', round(p_value, 4))t = 6.13 p = 0.0001
p = 0.0001 is far below 0.05, so we reject H₀: the two regions really do differ.
Which test do I use?
| Question | Test |
|---|---|
| Means of 2 groups differ? | stats.ttest_ind (t-test) |
| Means of 3+ groups differ? | stats.f_oneway (ANOVA) |
| Two categories related? | stats.chi2_contingency (chi-square) |
| Same group, before vs after? | stats.ttest_rel (paired t-test) |
- Test logic: state H₀/H₁ → choose α → compute p-value → reject H₀ if p < α.
- t-test compares 2 means; ANOVA compares 3+; chi-square tests categorical association.
- “Statistically significant” means unlikely-by-chance, not necessarily large or important.
4Confidence intervals & p-values
A single number hides uncertainty. A confidence interval (CI) reports a range you can trust.
import numpy as np
from scipy import stats
data = np.array([1200, 980, 1450, 760, 2100, 1180, 1320, 990])
mean = data.mean()
se = stats.sem(data) # standard error
ci = stats.t.interval(0.95, len(data)-1, loc=mean, scale=se)
print('Mean:', round(mean, 1))
print('95% CI:', tuple(round(x, 1) for x in ci))Mean: 1247.5 95% CI: (892.3, 1602.7)
Read it as: “our best estimate is 1247.5, and we are 95% confident the true average lies between 892 and 1603.” A wider CI means more uncertainty (small sample / high variability).
- A confidence interval reports a plausible range for the true value, not just a point estimate.
- Wider CI = more uncertainty; bigger samples narrow it.
- A p-value is P(data this extreme | H₀ true) — not the probability the hypothesis is true.
5A/B testing for business decisions
A/B testing is the analyst's superpower: run a controlled experiment to prove a change causes an improvement.
The recipe
- Change one thing (the variant B) vs the control (A).
- Randomly assign users to A or B to remove bias.
- Decide the sample size in advance (a power calculation).
- Run to the end, then test the difference.
from statsmodels.stats.proportion import proportions_ztest
# A: 120 conversions / 2400 visitors B: 168 / 2450
conversions = [120, 168]
visitors = [2400, 2450]
stat, p = proportions_ztest(conversions, visitors)
rate_a = conversions[0] / visitors[0]
rate_b = conversions[1] / visitors[1]
print('A:', round(rate_a*100, 2), '% B:', round(rate_b*100, 2), '%')
print('p-value:', round(p, 4))A: 5.0 % B: 6.86 % p-value: 0.0071
B's conversion (6.86%) beats A's (5.0%) and p = 0.007 < 0.05 — a statistically significant lift. Ship B.
- A/B testing proves causation by randomly assigning users to a control (A) and a variant (B).
- Fix the sample size before you start; run to completion; do not peek-and-stop.
- Compare conversion rates with a two-proportion z-test; check both statistical and practical significance.
6Non-parametric tests, errors & ethics
When data is skewed or not normal, the usual tests can mislead. Non-parametric tests make fewer assumptions.
| Instead of… | Use (non-parametric) |
|---|---|
| t-test (2 groups) | stats.mannwhitneyu (Mann-Whitney U) |
| ANOVA (3+ groups) | stats.kruskal (Kruskal-Wallis) |
| Pearson correlation | stats.spearmanr (rank correlation) |
Two ways to be wrong
| Error | Meaning |
|---|---|
| Type I (false positive) | You claim an effect that is not real (controlled by α). |
| Type II (false negative) | You miss a real effect (usually from too small a sample). |
- Use non-parametric tests (Mann-Whitney, Kruskal-Wallis, Spearman) for skewed/non-normal data.
- Type I = false positive (α); Type II = false negative (low power / small sample).
- Resist p-hacking and report honestly; watch for Simpson's paradox across subgroups.
★ Hands-on Project — A/B Test Analysis
Design and analyse a complete A/B test for a product change (e.g. a new checkout button), from sample-size planning to a written recommendation.
- Define the experiment: the one change (variant B), the control (A), and the single success metric (e.g. conversion rate).
- State H₀ (no difference) and H₁ (B differs from A) and choose α = 0.05.
- Estimate the sample size you need for a meaningful lift (a power calculation) before collecting data.
- Load or simulate the results: conversions and visitors for A and B.
- Run a two-proportion z-test with
proportions_ztestand report the p-value. - Compute the lift and a confidence interval for the difference in conversion rates.
- Decide: is the result statistically significant AND practically large enough to ship?
- Write a one-page recommendation (decision, evidence, risks) and push the notebook to GitHub.
Ready to test yourself?
Take the module quiz. Score 70% or more to mark this module complete.
Start the quiz →💡 Log in to save your progress and earn the certificate.