Decision Trees & Random Forests
Decision Trees
rpart — Decision Tree
rpart() builds classification and regression trees. Control complexity with cp (complexity parameter) — higher cp = simpler tree.
library(rpart); library(rpart.plot)
tree <- rpart(Species~., data=iris, method='class', control=rpart.control(cp=0.01))
rpart.plot(tree, type=4, extra=101) # visualise
best_cp <- tree$cptable[which.min(tree$cptable[,'xerror']),'CP']
pruned <- prune(tree, cp=best_cp) # prune to optimal size
# See the code example above and adapt it to your data. # Always check your output with str() and head().
Random Forest
randomForest() builds an ensemble of trees. Key hyperparameter: mtry (number of features to try at each split). Default: sqrt(p) for classification.
library(randomForest)
rf <- randomForest(Species~., data=iris, ntree=500, importance=TRUE)
print(rf) # OOB error estimate
varImpPlot(rf) # which features matter most?
partialPlot(rf, iris, 'Petal.Length') # partial dependence plot
# See the code example above and adapt it to your data. # Always check your output with str() and head().