Advanced & Unsupervised Learning — Quiz | Data Science using Python

Question 1

What does a decision tree do at each node?

A Average all values

B Ask a yes/no question that makes the groups purer

C Compute a correlation

D Scale the data

Question 2

Why does a single deep decision tree often overfit?

A It is too slow

B It can keep splitting until each leaf memorises one sample

C It ignores features

D It needs scaling

Question 3

How does a random forest differ from a single tree?

A It uses no trees

B It averages many decorrelated trees built on random subsets

C It is always one deep tree

D It only works on text

Question 4

How does gradient boosting build its trees?

A All at once independently

B Sequentially, each correcting the previous trees' errors

C Randomly with no order

D By clustering first

Question 5

For typical tabular/structured data, which family is usually the strongest default?

A Deep neural networks

B Gradient boosting (XGBoost/LightGBM)

C K-means

D Linear regression

Question 6

What is the most reliable way to rank feature importance?

A Alphabetical order

B Permutation importance (drop in score when a feature is shuffled)

C The longest column name

D Random guess

Question 7

K-means is an example of what kind of learning?

A Supervised

B Unsupervised

C Reinforcement

D Semi-supervised regression

Question 8

Why must you scale features before K-means?

A To save memory

B Because it uses distance, so large-scale features would dominate

C K-means cannot read text

D To remove duplicates

Question 9

What is the 'elbow' in the elbow method?

A The largest cluster

B The k where inertia stops dropping sharply

C The first data point

D The model's bias

Question 10

What does PCA do?

A Adds features

B Projects data onto new axes capturing the most variance

C Labels the data

D Removes duplicates

Question 11

What is a hyper-parameter?

A A value the model learns from data

B A setting you choose before training (e.g. tree depth)

C A type of feature

D A test metric

Question 12

Where should hyper-parameter tuning be performed?

A On the test set

B With cross-validation on the training data only

C On all data at once

D On a single row