📐 Module 6

Statistics for Data Analytics

⏱ 12 hoursIntermediate6 topics
🎯 By the end: reason with probability distributions, quantify uncertainty with confidence intervals, run and interpret hypothesis tests (t-test, chi-square, ANOVA), design and analyse an A/B test, and avoid the statistical traps that produce wrong conclusions.

Statistics is what separates a data analyst from a spreadsheet user. It lets you say not just “sales went up” but “sales went up, and we are 95% confident it is real, not luck.” This module gives you the core statistical toolkit to make claims you can defend.

1Probability & key distributions

A distribution describes how likely each value of a variable is. Three show up constantly in analytics.

DistributionModels…Example
Normal (bell)natural measurements that cluster around a meanheights, exam scores, measurement error
Binomialnumber of successes in N yes/no trialsconversions out of 1,000 visitors
Poissoncount of events in a fixed windowsupport tickets per hour

The Normal distribution follows the 68–95–99.7 rule: about 68% of values fall within 1 standard deviation of the mean, 95% within 2, and 99.7% within 3.

mean (μ)-1σ+1σ-2σ+2σ68%
The normal curve and the 68–95–99.7 rule.
from scipy import stats

# In a standard Normal: P(value < 1.5)
print('P(Z < 1.5) =', round(stats.norm.cdf(1.5), 4))

# The 95th percentile (one-sided z critical value)
print('95th percentile z =', round(stats.norm.ppf(0.95), 3))
▶ Output
P(Z < 1.5) = 0.9332
95th percentile z = 1.645
Key points
  • Normal models clustered measurements; Binomial counts successes in trials; Poisson counts events per window.
  • The 68–95–99.7 rule: ±1σ, ±2σ, ±3σ around the mean cover those shares of a normal distribution.
  • scipy.stats gives probabilities (cdf) and percentiles (ppf) for any distribution.

2Sampling & the Central Limit Theorem

You almost never measure a whole population — you study a sample and infer. The risk is sampling error: a different sample would give a slightly different answer.

Standard error: how much a sample mean wobbles

import numpy as np

sample = np.array([1200, 980, 1450, 760, 2100, 1180, 1320, 990])

mean = sample.mean()
std  = sample.std(ddof=1)               # sample standard deviation
se   = std / np.sqrt(len(sample))       # standard error of the mean

print('Mean:', round(mean, 1), ' Standard error:', round(se, 1))
▶ Output
Mean: 1247.5  Standard error: 150.2
The Central Limit Theorem (CLT): if you take many samples, the distribution of their means is approximately Normal — even if the underlying data is not. This is why so much of statistics relies on the Normal curve, and why bigger samples give more reliable estimates (standard error shrinks as √n grows).
Key points
  • We infer about a population from a sample; sampling error is the unavoidable wobble.
  • Standard error = sample std ÷ √n — it shrinks as the sample grows.
  • The CLT makes sample means approximately Normal, underpinning most statistical tests.

3Hypothesis testing: the framework

A hypothesis test answers: “Could this result be just chance?” The logic is always the same.

  1. State a null hypothesis H₀ (no effect / no difference) and an alternative H₁.
  2. Pick a significance level α (usually 0.05).
  3. Compute a test statistic and its p-value.
  4. If p < α, reject H₀ — the effect is “statistically significant”.
from scipy import stats

# Did region A and region B have different average order values?
region_a = [1200, 1450, 1320, 1180, 1390, 1275]
region_b = [ 980, 1020,  940, 1100,  995, 1010]

t_stat, p_value = stats.ttest_ind(region_a, region_b)
print('t =', round(t_stat, 2), ' p =', round(p_value, 4))
▶ Output
t = 6.13  p = 0.0001

p = 0.0001 is far below 0.05, so we reject H₀: the two regions really do differ.

Which test do I use?

QuestionTest
Means of 2 groups differ?stats.ttest_ind (t-test)
Means of 3+ groups differ?stats.f_oneway (ANOVA)
Two categories related?stats.chi2_contingency (chi-square)
Same group, before vs after?stats.ttest_rel (paired t-test)
Key points
  • Test logic: state H₀/H₁ → choose α → compute p-value → reject H₀ if p < α.
  • t-test compares 2 means; ANOVA compares 3+; chi-square tests categorical association.
  • “Statistically significant” means unlikely-by-chance, not necessarily large or important.

4Confidence intervals & p-values

A single number hides uncertainty. A confidence interval (CI) reports a range you can trust.

import numpy as np
from scipy import stats

data = np.array([1200, 980, 1450, 760, 2100, 1180, 1320, 990])
mean = data.mean()
se   = stats.sem(data)                       # standard error

ci = stats.t.interval(0.95, len(data)-1, loc=mean, scale=se)
print('Mean:', round(mean, 1))
print('95% CI:', tuple(round(x, 1) for x in ci))
▶ Output
Mean: 1247.5
95% CI: (892.3, 1602.7)

Read it as: “our best estimate is 1247.5, and we are 95% confident the true average lies between 892 and 1603.” A wider CI means more uncertainty (small sample / high variability).

What a p-value is NOT. A p-value of 0.03 does not mean “a 3% chance H₀ is true” or “a 97% chance the effect is real.” It means: if H₀ were true, you would see a result this extreme only 3% of the time. Significance is also not the same as importance — a tiny, useless effect can be “significant” with a huge sample.
Key points
  • A confidence interval reports a plausible range for the true value, not just a point estimate.
  • Wider CI = more uncertainty; bigger samples narrow it.
  • A p-value is P(data this extreme | H₀ true) — not the probability the hypothesis is true.

5A/B testing for business decisions

A/B testing is the analyst's superpower: run a controlled experiment to prove a change causes an improvement.

The recipe

  1. Change one thing (the variant B) vs the control (A).
  2. Randomly assign users to A or B to remove bias.
  3. Decide the sample size in advance (a power calculation).
  4. Run to the end, then test the difference.
from statsmodels.stats.proportion import proportions_ztest

# A: 120 conversions / 2400 visitors   B: 168 / 2450
conversions = [120, 168]
visitors    = [2400, 2450]

stat, p = proportions_ztest(conversions, visitors)
rate_a = conversions[0] / visitors[0]
rate_b = conversions[1] / visitors[1]
print('A:', round(rate_a*100, 2), '%   B:', round(rate_b*100, 2), '%')
print('p-value:', round(p, 4))
▶ Output
A: 5.0 %   B: 6.86 %
p-value: 0.0071

B's conversion (6.86%) beats A's (5.0%) and p = 0.007 < 0.05 — a statistically significant lift. Ship B.

The classic A/B traps: peeking and stopping the moment it looks significant (inflates false positives); too-small samples; running many variants without correcting for multiple comparisons; and ignoring practical significance (is the lift big enough to matter?).
Key points
  • A/B testing proves causation by randomly assigning users to a control (A) and a variant (B).
  • Fix the sample size before you start; run to completion; do not peek-and-stop.
  • Compare conversion rates with a two-proportion z-test; check both statistical and practical significance.

6Non-parametric tests, errors & ethics

When data is skewed or not normal, the usual tests can mislead. Non-parametric tests make fewer assumptions.

Instead of…Use (non-parametric)
t-test (2 groups)stats.mannwhitneyu (Mann-Whitney U)
ANOVA (3+ groups)stats.kruskal (Kruskal-Wallis)
Pearson correlationstats.spearmanr (rank correlation)

Two ways to be wrong

ErrorMeaning
Type I (false positive)You claim an effect that is not real (controlled by α).
Type II (false negative)You miss a real effect (usually from too small a sample).
Avoid p-hacking. Testing many things until something hits p < 0.05, or stopping early on good news, manufactures false discoveries. Decide your hypothesis and sample size before looking.
Data ethics: report honestly even when results disappoint; never cherry-pick the chart that flatters; respect privacy and consent; and remember Simpson's paradox — a trend can reverse when you split by a hidden subgroup, so always sanity-check segments.
Key points
  • Use non-parametric tests (Mann-Whitney, Kruskal-Wallis, Spearman) for skewed/non-normal data.
  • Type I = false positive (α); Type II = false negative (low power / small sample).
  • Resist p-hacking and report honestly; watch for Simpson's paradox across subgroups.

★ Hands-on Project — A/B Test Analysis

Design and analyse a complete A/B test for a product change (e.g. a new checkout button), from sample-size planning to a written recommendation.

  1. Define the experiment: the one change (variant B), the control (A), and the single success metric (e.g. conversion rate).
  2. State H₀ (no difference) and H₁ (B differs from A) and choose α = 0.05.
  3. Estimate the sample size you need for a meaningful lift (a power calculation) before collecting data.
  4. Load or simulate the results: conversions and visitors for A and B.
  5. Run a two-proportion z-test with proportions_ztest and report the p-value.
  6. Compute the lift and a confidence interval for the difference in conversion rates.
  7. Decide: is the result statistically significant AND practically large enough to ship?
  8. Write a one-page recommendation (decision, evidence, risks) and push the notebook to GitHub.

Ready to test yourself?

Take the module quiz. Score 70% or more to mark this module complete.

Start the quiz →

💡 Log in to save your progress and earn the certificate.