Data Visualisation with ggplot2
Intermediate
15 hrs
3 Concepts
M1
The Grammar of Graphics
Concept 1
The ggplot2 Template
Every ggplot2 chart follows the same template:
R
ggplot(data, aes(x=col1, y=col2, colour=col3)) +
geom_*() +
scale_*() +
facet_*() +
theme_*()
Key concepts:
ggplot(data, aes(...))— define data and aesthetic mappingsaes()insideggplot()is inherited by all geoms+adds layers (NOT|>)colourinsideaes()= map a variable → creates legendcolour = 'red'outsideaes()= fixed value → no legend
R
library(ggplot2)
# Scatter plot
ggplot(mtcars, aes(x=wt, y=mpg, colour=factor(cyl))) +
geom_point(size=3, alpha=0.8) +
geom_smooth(method='lm', se=TRUE) +
scale_colour_brewer(palette='Set1') +
labs(
title = 'Fuel Efficiency vs Car Weight',
subtitle = 'By number of cylinders',
x = 'Weight (1000 lbs)', y = 'MPG', colour = 'Cylinders'
) +
theme_minimal(base_size=12)
Solved Examples
Example 1
Plot iris: Sepal.Length on x, Sepal.Width on y, coloured by Species. Add a smoothed trend line per species.
R
ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, colour=Species)) +
geom_point(alpha=0.7, size=2) +
geom_smooth(method='lm', se=FALSE) +
labs(title='Iris: Sepal Dimensions by Species') +
theme_bw()
Self-Assessment (3 questions)
Q1. In ggplot2, what does placing colour INSIDE aes() do?
Q2. How do you add layers in ggplot2?
Q3. Which geom creates a smoothed trend line?
Concept 2
Essential Geoms
Each geom creates a different chart type. The most common:
R
# Bar chart — stat='identity' uses y directly
ggplot(summary_df, aes(x=subject, y=mean_score, fill=subject)) +
geom_col() +
geom_text(aes(label=round(mean_score,1)), vjust=-0.3, size=3.5)
# Histogram + density overlay
ggplot(df, aes(x=score)) +
geom_histogram(binwidth=5, fill='#2563eb', colour='white', alpha=0.8) +
geom_density(aes(y=after_stat(count)*5), colour='red', linewidth=1)
# Box plot with raw data overlay
ggplot(df, aes(x=subject, y=score, fill=subject)) +
geom_boxplot(outlier.shape=NA, alpha=0.6) +
geom_jitter(width=0.2, size=1.5, alpha=0.7) +
theme_bw()
# Line chart for time series
ggplot(economics, aes(x=date, y=unemploy/pop)) +
geom_line(colour='#2563eb', linewidth=0.8) +
geom_area(alpha=0.15, fill='#2563eb') +
scale_y_continuous(labels=scales::percent)
# Heat map
ggplot(df, aes(x=month, y=subject, fill=avg_score)) +
geom_tile(colour='white', linewidth=0.5) +
scale_fill_gradient2(low='#ef4444', mid='#fbbf24', high='#16a34a', midpoint=75)
R
# Scatter plot: engine displacement vs fuel efficiency
library(ggplot2)
ggplot(mpg, aes(x = displ, y = hwy, color = drv)) +
geom_point(size = 2.5, alpha = 0.7) +
labs(title = "Engine Size vs Highway MPG",
x = "Displacement (litres)", y = "Highway MPG",
color = "Drive Type") +
theme_minimal()
Chart Output
Solved Examples
Example 1
Plot score distributions for three groups A, B, C side by side using boxplots.
R
ggplot(df, aes(x=group, y=score, fill=group)) +
geom_boxplot(outlier.colour='red', outlier.shape=8) +
scale_fill_manual(values=c('#dbeafe','#dcfce7','#fef3c7')) +
stat_summary(fun=mean, geom='point', shape=23, size=3, fill='white') +
labs(title='Score by Group — Box Plot') +
theme_minimal() + theme(legend.position='none')
Self-Assessment (2 questions)
Q1. What is the difference between geom_bar() and geom_col()?
Q2. Which geom adds random scatter to reduce overplotting?
M2
Themes and Customisation
Concept 3
Scales and Themes
Scales control how data values map to visual properties. Themes control non-data elements (text, background, grid).
R
# Colour scales
scale_colour_brewer(palette='Set1') # qualitative
scale_colour_viridis_c() # continuous, colourblind-safe
scale_colour_manual(values=c('#2563eb','#16a34a','#dc2626'))
# Axis scales
scale_y_continuous(labels=scales::comma) # 1,000 not 1000
scale_x_continuous(limits=c(0,100))
scale_y_log10() # log scale
# Built-in themes
theme_minimal() # clean white
theme_bw() # black & white
theme_classic() # minimal gridlines
theme_dark() # dark background
# Custom theme elements
+ theme(
plot.title = element_text(face='bold', size=16, colour='#1e1b4b'),
axis.text = element_text(size=10),
panel.grid.minor = element_blank(),
legend.position = 'bottom',
plot.margin = margin(20, 20, 20, 20)
)
# Save as reusable function
vidaara_theme <- function(base_size=12){
theme_minimal(base_size=base_size) %+replace%
theme(plot.title=element_text(face='bold',colour='#1a2744'))
}
R
# Bar chart: car class distribution
ggplot(mpg, aes(x = class, fill = class)) +
geom_bar() +
labs(title = "Car Count by Class", x = "Class", y = "Count") +
theme_minimal() +
theme(legend.position = "none")
Chart Output
Solved Examples
Example 1
Show y-axis in Indian Rupees format: Rs 1,00,000.
R
library(scales)
ggplot(df, aes(x=month, y=revenue)) +
geom_col(fill='#2563eb') +
scale_y_continuous(
labels = function(x) paste0('Rs ', format(x, big.mark=','))
) +
labs(y='Revenue (INR)')
Self-Assessment (2 questions)
Q1. Which scale is best for colourblind-safe continuous data?
Q2. What does element_blank() do in a theme() call?