Statistics
Collecting and Presenting Data
What is Statistics?
Statistics is the branch of mathematics that deals with the collection, organization, analysis, and interpretation of data. It helps us make sense of information and draw meaningful conclusions.
What is Data?
Data is a collection of facts, numbers, or observations. Examples include:
- Test scores of students in a class
- Heights of players on a sports team
- Number of cars passing a toll booth each hour
Types of Data:
| Type | Definition | Example |
|---|---|---|
| **Secondary Data** | Obtained from published sources | Data from a government report |
Steps in Data Collection and Presentation:
- Collect the data through observation, surveys, or experiments
- Organize the data in a systematic way
- Present the data in tables or graphs for easy understanding
Frequency Distribution Table:
A frequency distribution table shows how often each value (or range of values) occurs in the data.
Raw Data: 5, 3, 8, 5, 7, 5, 9, 8, 5, 7, 3, 8, 5, 6, 7
Frequency Table:
| Value | Tally Marks | Frequency |
|---|---|---|
| 5 | IIII | 5 |
| 6 | I | 1 |
| 7 | III | 3 |
| 8 | III | 3 |
| 9 | I | 1 |
| **Total** | **15** |
- Grouped data uses classes and frequencies.
- Class mark = (lower + upper) ÷ 2; size = upper − lower.
Graphs of Data
What are Graphical Representations?
Graphical representations are visual ways to display data, making it easier to understand patterns, trends, and comparisons at a glance.
Types of Graphs:
| Graph Type | Best Used For | Key Feature |
|---|---|---|
| **Histogram** | Showing distribution of continuous data | Bars touch each other (no gaps) |
| **Frequency Polygon** | Comparing multiple distributions | Line connecting midpoints |
Bar Graph:
- Used for discrete data (separate categories)
- Bars have equal width with gaps between them
- Height of bar represents frequency
- Can be vertical or horizontal
Histogram:
- Used for continuous data grouped into class intervals
- Bars touch each other (no gaps)
- Width of bar represents class interval size
- Area of bar represents frequency
Frequency Polygon:
- Created by joining the midpoints of histogram bars
- Starts and ends on the x-axis (at frequency 0)
- Useful for comparing two or more data sets on same graph
- Bar graph: gaps; histogram: adjacent bars, area ∝ frequency.
- Frequency polygon joins bar-top midpoints.
Mean, Median and Mode
What are Measures of Central Tendency?
Measures of central tendency are single values that describe the "center" or "typical value" of a data set. The three main measures are mean, median, and mode.
Mean (Average):
- Sum of all values divided by the number of values
- Formula: \(\bar{x} = \frac{\text{Sum of all observations}}{\text{Number of observations}}\)
- Most commonly used measure
Median (Middle Value):
- The middle value when data is arranged in order
- For odd number of observations: middle value
- For even number of observations: average of two middle values
- Not affected by extreme values (outliers)
Mode (Most Frequent):
- The value that occurs most frequently
- A data set can have one mode (unimodal), two modes (bimodal), or no mode
- Useful for categorical data
Comparison of Measures:
| Measure | Best Used When | Affected by Outliers? |
|---|---|---|
| Median | Data has outliers or is skewed | No |
| Mode | Data is categorical or has repeated values | No |
- Mean = sum ÷ count; median = middle value (ordered).
- Mode = most frequent; Mode ≈ 3·Median − 2·Mean.