Data Reshaping with tidyr

Intermediate 8 hrs 3 Concepts

Your Learning Map

📌 You already know

You can wrangle a data frame with dplyr verbs.

🎯 You'll learn here

Reshaping data between wide and long form with pivot_longer / pivot_wider, and handling NA.

🌍 Where it's used

Real data arrives in the wrong shape — survey columns per year, sensor readings per device; tidying it is half the job.

🔗 Unlocks next

Tidy (long) data is exactly what ggplot2 expects.

Wide and Long Format

Concept 1

pivot_longer — Wide to Long

Tidy data: one observation per row, one variable per column. ggplot2 and most models require long format.

pivot_longer() converts multiple columns into two: a 'names' column and a 'values' column.

library(tidyr)
wide <- tibble(student=c('A','B'), Math=c(92,88), Science=c(85,91))
long <- wide |> pivot_longer(cols=-student, names_to='subject', values_to='score')
# Result: 4 rows (2 students x 2 subjects)

Solved Examples

Example 1 Apply the concept of pivot_longer — Wide to Long to a sample dataset. Show at least two approaches.

# See the code example above and adapt it to your data. # Always check your output with str() and head().

Self-Assessment (2 questions)

Q1. pivot_longer() transforms data from:

pivot_longer() gathers many columns into key-value pairs, making wide data long (tidy).

Q2. In tidy data:

Tidy data has one column per variable and one row per observation.

Concept 2

pivot_wider — Long to Wide

pivot_wider() is the reverse — convert a key-value long format back to wide format.

long |> pivot_wider(names_from=subject, values_from=score)
# Back to original wide format

R — A wide summary table LIVE READY

table(cyl = mtcars$cyl, gear = mtcars$gear)

Output below is verified. Click to run real R in your browser (first run loads ~20 MB once).

Output (verified)

   gear
cyl  3  4  5
  4  1  8  2
  6  2  4  1
  8 12  0  2

Solved Examples

Example 1 Apply the concept of pivot_wider — Long to Wide to a sample dataset. Show at least two approaches.

# See the code example above and adapt it to your data. # Always check your output with str() and head().

Self-Assessment (2 questions)

Q1. pivot_wider() is the inverse of:

pivot_wider() spreads key-value pairs back into multiple columns - the inverse of pivot_longer().

Q2. A good use of pivot_wider() is to:

pivot_wider() turns the values of a column into separate columns - handy for summary tables.

Missing Data

Concept 1

Handling NA Values

R uses NA for missing values. Always check for NAs before analysis.

Key functions: is.na(), na.omit(), replace_na(), fill(), drop_na().

df |> drop_na()               # remove rows with any NA
df |> replace_na(list(x=0))   # replace specific column NAs
df |> fill(x, .direction='down')  # forward fill
df |> mutate(x=if_else(is.na(x), mean(x,na.rm=TRUE), x))  # mean impute

Solved Examples

Example 1 Apply the concept of Handling NA Values to a sample dataset. Show at least two approaches.

# See the code example above and adapt it to your data. # Always check your output with str() and head().

Self-Assessment (2 questions)

Q1. In R, a missing value is represented by:

NA marks a missing value; NULL is the absence of an object, and 0 or "" are real values.

Q2. To drop rows that contain any missing value, you can use:

tidyr::drop_na() removes rows containing NA; fill()/replace_na() instead fill them in.

Data Wrangling with dplyr Data Visualisation with ggplot2