Data Reshaping with tidyr
R Programming & Data Analytics / Data Reshaping with tidyr

Data Reshaping with tidyr

Intermediate 8 hrs 3 Concepts
M1

Wide and Long Format

Concept 1

pivot_longer — Wide to Long

Tidy data: one observation per row, one variable per column. ggplot2 and most models require long format.

pivot_longer() converts multiple columns into two: a 'names' column and a 'values' column.

R
library(tidyr)
wide <- tibble(student=c('A','B'), Math=c(92,88), Science=c(85,91))
long <- wide |> pivot_longer(cols=-student, names_to='subject', values_to='score')
# Result: 4 rows (2 students x 2 subjects)
Solved Examples
Example 1 Apply the concept of pivot_longer — Wide to Long to a sample dataset. Show at least two approaches.

# See the code example above and adapt it to your data. # Always check your output with str() and head().

Self-Assessment (2 questions)
Q1. What is the primary purpose of pivot_longer — wide to long?
Q2. Which R package is most relevant for this topic?
Concept 2

pivot_wider — Long to Wide

pivot_wider() is the reverse — convert a key-value long format back to wide format.

R
long |> pivot_wider(names_from=subject, values_from=score)
# Back to original wide format
Solved Examples
Example 1 Apply the concept of pivot_wider — Long to Wide to a sample dataset. Show at least two approaches.

# See the code example above and adapt it to your data. # Always check your output with str() and head().

Self-Assessment (2 questions)
Q1. What is the primary purpose of pivot_wider — long to wide?
Q2. Which R package is most relevant for this topic?
M2

Missing Data

Concept 1

Handling NA Values

R uses NA for missing values. Always check for NAs before analysis.

Key functions: is.na(), na.omit(), replace_na(), fill(), drop_na().

R
df |> drop_na()               # remove rows with any NA
df |> replace_na(list(x=0))   # replace specific column NAs
df |> fill(x, .direction='down')  # forward fill
df |> mutate(x=if_else(is.na(x), mean(x,na.rm=TRUE), x))  # mean impute
Solved Examples
Example 1 Apply the concept of Handling NA Values to a sample dataset. Show at least two approaches.

# See the code example above and adapt it to your data. # Always check your output with str() and head().

Self-Assessment (2 questions)
Q1. What is the primary purpose of handling na values?
Q2. Which R package is most relevant for this topic?
Data Wrangling with dplyr Data Visualisation with ggplot2