Data Reshaping with tidyr
Wide and Long Format
pivot_longer — Wide to Long
Tidy data: one observation per row, one variable per column. ggplot2 and most models require long format.
pivot_longer() converts multiple columns into two: a 'names' column and a 'values' column.
library(tidyr)
wide <- tibble(student=c('A','B'), Math=c(92,88), Science=c(85,91))
long <- wide |> pivot_longer(cols=-student, names_to='subject', values_to='score')
# Result: 4 rows (2 students x 2 subjects)
# See the code example above and adapt it to your data. # Always check your output with str() and head().
pivot_wider — Long to Wide
pivot_wider() is the reverse — convert a key-value long format back to wide format.
long |> pivot_wider(names_from=subject, values_from=score)
# Back to original wide format
# See the code example above and adapt it to your data. # Always check your output with str() and head().
Missing Data
Handling NA Values
R uses NA for missing values. Always check for NAs before analysis.
Key functions: is.na(), na.omit(), replace_na(), fill(), drop_na().
df |> drop_na() # remove rows with any NA
df |> replace_na(list(x=0)) # replace specific column NAs
df |> fill(x, .direction='down') # forward fill
df |> mutate(x=if_else(is.na(x), mean(x,na.rm=TRUE), x)) # mean impute
# See the code example above and adapt it to your data. # Always check your output with str() and head().