Data Visualisation with ggplot2
R Programming & Data Analytics / Data Visualisation with ggplot2

Data Visualisation with ggplot2

Intermediate 15 hrs 3 Concepts
M1

The Grammar of Graphics

Concept 1

The ggplot2 Template

Every ggplot2 chart follows the same template:

R
ggplot(data, aes(x=col1, y=col2, colour=col3)) +
  geom_*() +
  scale_*() +
  facet_*() +
  theme_*()

Key concepts:

  • ggplot(data, aes(...)) — define data and aesthetic mappings
  • aes() inside ggplot() is inherited by all geoms
  • + adds layers (NOT |>)
  • colour inside aes() = map a variable → creates legend
  • colour = 'red' outside aes() = fixed value → no legend
R
library(ggplot2)

# Scatter plot
ggplot(mtcars, aes(x=wt, y=mpg, colour=factor(cyl))) +
  geom_point(size=3, alpha=0.8) +
  geom_smooth(method='lm', se=TRUE) +
  scale_colour_brewer(palette='Set1') +
  labs(
    title    = 'Fuel Efficiency vs Car Weight',
    subtitle = 'By number of cylinders',
    x = 'Weight (1000 lbs)', y = 'MPG', colour = 'Cylinders'
  ) +
  theme_minimal(base_size=12)
Solved Examples
Example 1 Plot iris: Sepal.Length on x, Sepal.Width on y, coloured by Species. Add a smoothed trend line per species.
R
ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, colour=Species)) +
  geom_point(alpha=0.7, size=2) +
  geom_smooth(method='lm', se=FALSE) +
  labs(title='Iris: Sepal Dimensions by Species') +
  theme_bw()
Self-Assessment (3 questions)
Q1. In ggplot2, what does placing colour INSIDE aes() do?
Q2. How do you add layers in ggplot2?
Q3. Which geom creates a smoothed trend line?
Concept 2

Essential Geoms

Each geom creates a different chart type. The most common:

R
# Bar chart — stat='identity' uses y directly
ggplot(summary_df, aes(x=subject, y=mean_score, fill=subject)) +
  geom_col() +
  geom_text(aes(label=round(mean_score,1)), vjust=-0.3, size=3.5)

# Histogram + density overlay
ggplot(df, aes(x=score)) +
  geom_histogram(binwidth=5, fill='#2563eb', colour='white', alpha=0.8) +
  geom_density(aes(y=after_stat(count)*5), colour='red', linewidth=1)

# Box plot with raw data overlay
ggplot(df, aes(x=subject, y=score, fill=subject)) +
  geom_boxplot(outlier.shape=NA, alpha=0.6) +
  geom_jitter(width=0.2, size=1.5, alpha=0.7) +
  theme_bw()

# Line chart for time series
ggplot(economics, aes(x=date, y=unemploy/pop)) +
  geom_line(colour='#2563eb', linewidth=0.8) +
  geom_area(alpha=0.15, fill='#2563eb') +
  scale_y_continuous(labels=scales::percent)

# Heat map
ggplot(df, aes(x=month, y=subject, fill=avg_score)) +
  geom_tile(colour='white', linewidth=0.5) +
  scale_fill_gradient2(low='#ef4444', mid='#fbbf24', high='#16a34a', midpoint=75)
R
# Scatter plot: engine displacement vs fuel efficiency
library(ggplot2)
ggplot(mpg, aes(x = displ, y = hwy, color = drv)) +
  geom_point(size = 2.5, alpha = 0.7) +
  labs(title = "Engine Size vs Highway MPG",
       x = "Displacement (litres)", y = "Highway MPG",
       color = "Drive Type") +
  theme_minimal()
Chart Output
Solved Examples
Example 1 Plot score distributions for three groups A, B, C side by side using boxplots.
R
ggplot(df, aes(x=group, y=score, fill=group)) +
  geom_boxplot(outlier.colour='red', outlier.shape=8) +
  scale_fill_manual(values=c('#dbeafe','#dcfce7','#fef3c7')) +
  stat_summary(fun=mean, geom='point', shape=23, size=3, fill='white') +
  labs(title='Score by Group — Box Plot') +
  theme_minimal() + theme(legend.position='none')
Self-Assessment (2 questions)
Q1. What is the difference between geom_bar() and geom_col()?
Q2. Which geom adds random scatter to reduce overplotting?
M2

Themes and Customisation

Concept 3

Scales and Themes

Scales control how data values map to visual properties. Themes control non-data elements (text, background, grid).

R
# Colour scales
scale_colour_brewer(palette='Set1')          # qualitative
scale_colour_viridis_c()                      # continuous, colourblind-safe
scale_colour_manual(values=c('#2563eb','#16a34a','#dc2626'))

# Axis scales
scale_y_continuous(labels=scales::comma)      # 1,000 not 1000
scale_x_continuous(limits=c(0,100))
scale_y_log10()                               # log scale

# Built-in themes
theme_minimal()   # clean white
theme_bw()        # black & white
theme_classic()   # minimal gridlines
theme_dark()      # dark background

# Custom theme elements
+ theme(
    plot.title   = element_text(face='bold', size=16, colour='#1e1b4b'),
    axis.text    = element_text(size=10),
    panel.grid.minor = element_blank(),
    legend.position  = 'bottom',
    plot.margin  = margin(20, 20, 20, 20)
  )

# Save as reusable function
vidaara_theme <- function(base_size=12){
  theme_minimal(base_size=base_size) %+replace%
  theme(plot.title=element_text(face='bold',colour='#1a2744'))
}
R
# Bar chart: car class distribution
ggplot(mpg, aes(x = class, fill = class)) +
  geom_bar() +
  labs(title = "Car Count by Class", x = "Class", y = "Count") +
  theme_minimal() +
  theme(legend.position = "none")
Chart Output
Solved Examples
Example 1 Show y-axis in Indian Rupees format: Rs 1,00,000.
R
library(scales)
ggplot(df, aes(x=month, y=revenue)) +
  geom_col(fill='#2563eb') +
  scale_y_continuous(
    labels = function(x) paste0('Rs ', format(x, big.mark=','))
  ) +
  labs(y='Revenue (INR)')
Self-Assessment (2 questions)
Q1. Which scale is best for colourblind-safe continuous data?
Q2. What does element_blank() do in a theme() call?
Data Reshaping with tidyr Advanced ggplot2 & plotly