Machine Learning Foundations — Quiz | Data Science using Python

Question 1

Which problem is a regression task?

A Predicting spam vs not spam

B Predicting tomorrow's house price

C Grouping customers into segments

D Detecting the language of a tweet

Question 2

Why must you keep a separate test set?

A To train faster

B To measure performance on data the model never saw

C To save memory

D It is optional

Question 3

What three methods define scikit-learn's estimator API?

A open, read, close

B fit, predict, score

C load, clean, save

D add, commit, push

Question 4

What does logistic regression output before thresholding?

A A category directly

B A probability between 0 and 1

C An RMSE

D A cluster id

Question 5

In a confusion matrix, a false negative (FN) is…

A a positive predicted as positive

B a real positive predicted as negative (a miss)

C a real negative predicted as negative

D a real negative predicted as positive

Question 6

Why can accuracy be misleading on imbalanced data?

A It is always wrong

B A model can score high by always predicting the majority class while catching no positives

C It is too slow

D It needs scaling

Question 7

Recall answers which question?

A Of flagged positives, how many were correct?

B Of the real positives, how many did we catch?

C What is the average error?

D How fast is the model?

Question 8

What is the benefit of k-fold cross-validation over a single split?

A It is faster

B It rotates the validation fold to give a stable mean and spread

C It needs no test set

D It removes outliers

Question 9

What is data leakage?

A Losing data files

B Test information influencing training (e.g. scaling with the test set)

C Too many features

D A network error

Question 10

What does a scikit-learn Pipeline guarantee?

A Faster training only

B Preprocessing is fit only on training data within each fold (leak-free)

C Higher accuracy always

D No need to split data

Question 11

A model scores 0.99 on training but 0.62 on test. This is…

A underfitting

B overfitting

C perfect

D data leakage in your favour

Question 12

Which is a cure for overfitting?

A A more complex model

B Regularisation (Ridge/Lasso) and more data

C Removing the test set

D Training longer on the same data