Decision Trees & Random Forests
R Programming & Data Analytics / Decision Trees & Random Forests

Decision Trees & Random Forests

Advanced 10 hrs 2 Concepts
M1

Decision Trees

Concept 1

rpart — Decision Tree

rpart() builds classification and regression trees. Control complexity with cp (complexity parameter) — higher cp = simpler tree.

R
library(rpart); library(rpart.plot)
tree <- rpart(Species~., data=iris, method='class', control=rpart.control(cp=0.01))
rpart.plot(tree, type=4, extra=101)   # visualise
best_cp <- tree$cptable[which.min(tree$cptable[,'xerror']),'CP']
pruned <- prune(tree, cp=best_cp)     # prune to optimal size
Solved Examples
Example 1 Apply the concept of rpart — Decision Tree to a sample dataset. Show at least two approaches.

# See the code example above and adapt it to your data. # Always check your output with str() and head().

Self-Assessment (2 questions)
Q1. What is the primary purpose of rpart — decision tree?
Q2. Which R package is most relevant for this topic?
Concept 2

Random Forest

randomForest() builds an ensemble of trees. Key hyperparameter: mtry (number of features to try at each split). Default: sqrt(p) for classification.

R
library(randomForest)
rf <- randomForest(Species~., data=iris, ntree=500, importance=TRUE)
print(rf)            # OOB error estimate
varImpPlot(rf)       # which features matter most?
partialPlot(rf, iris, 'Petal.Length')  # partial dependence plot
Solved Examples
Example 1 Apply the concept of Random Forest to a sample dataset. Show at least two approaches.

# See the code example above and adapt it to your data. # Always check your output with str() and head().

Self-Assessment (2 questions)
Q1. What is the primary purpose of random forest?
Q2. Which R package is most relevant for this topic?
Machine Learning with caret Unsupervised Learning & PCA