Machine Learning Foundations — Quiz

Answer all 12 questions, then submit. You need 70% to pass. Log in to save progress.

Question 1
Which problem is a regression task?
A Predicting spam vs not spam
B Predicting tomorrow's house price
C Grouping customers into segments
D Detecting the language of a tweet
Question 2
Why must you keep a separate test set?
A To train faster
B To measure performance on data the model never saw
C To save memory
D It is optional
Question 3
What three methods define scikit-learn's estimator API?
A open, read, close
B fit, predict, score
C load, clean, save
D add, commit, push
Question 4
What does logistic regression output before thresholding?
A A category directly
B A probability between 0 and 1
C An RMSE
D A cluster id
Question 5
In a confusion matrix, a false negative (FN) is…
A a positive predicted as positive
B a real positive predicted as negative (a miss)
C a real negative predicted as negative
D a real negative predicted as positive
Question 6
Why can accuracy be misleading on imbalanced data?
A It is always wrong
B A model can score high by always predicting the majority class while catching no positives
C It is too slow
D It needs scaling
Question 7
Recall answers which question?
A Of flagged positives, how many were correct?
B Of the real positives, how many did we catch?
C What is the average error?
D How fast is the model?
Question 8
What is the benefit of k-fold cross-validation over a single split?
A It is faster
B It rotates the validation fold to give a stable mean and spread
C It needs no test set
D It removes outliers
Question 9
What is data leakage?
A Losing data files
B Test information influencing training (e.g. scaling with the test set)
C Too many features
D A network error
Question 10
What does a scikit-learn Pipeline guarantee?
A Faster training only
B Preprocessing is fit only on training data within each fold (leak-free)
C Higher accuracy always
D No need to split data
Question 11
A model scores 0.99 on training but 0.62 on test. This is…
A underfitting
B overfitting
C perfect
D data leakage in your favour
Question 12
Which is a cure for overfitting?
A A more complex model
B Regularisation (Ridge/Lasso) and more data
C Removing the test set
D Training longer on the same data