Capstone Project & Professional Assessment — Quiz

Answer all 18 questions, then submit. You need 70% to pass. Log in to save progress.

Question 1
A strong end-to-end data-science project starts and ends with…
A a dataset
B a clear question (with a success metric) and an actionable recommendation
C a neural network
D a chart
Question 2
In Module 1's terms, a model's prediction from features and weights is computed with a…
A sort
B dot product
C histogram
D merge
Question 3
Which preserves a cleaned DataFrame's dtypes for later reuse?
A CSV
B Parquet
C plain text
D a screenshot
Question 4
A column has skew of +1.6. The mean will tend to be…
A below the median
B above the median
C equal to the median
D undefined
Question 5
A 95% confidence interval means…
A 95% chance the true value is in this interval
B ≈95% of such intervals would contain the true value if sampling were repeated
C the data is 95% clean
D the model is 95% accurate
Question 6
To get an honest estimate of model performance you must…
A evaluate on the training data
B evaluate on a held-out test set never seen in training
C use all data for training
D skip validation
Question 7
On a dataset where only 2% are positive, the misleading metric is…
A recall
B accuracy
C precision
D F1
Question 8
What does a scikit-learn Pipeline prevent?
A Fast training
B Data leakage from preprocessing into validation
C Saving the model
D Using cross-validation
Question 9
For typical tabular data, a strong default model family is…
A a plain dense neural network
B gradient-boosted trees (XGBoost/LightGBM)
C k-means
D PCA
Question 10
Before K-means clustering you should always…
A shuffle the rows
B scale the features
C remove the labels' names
D one-hot encode the target
Question 11
What is the key idea of transfer learning?
A Train from scratch every time
B Reuse a model pretrained on large data and adapt it with little data
C Move files to the cloud
D Delete the pretrained weights
Question 12
In a transformer, attention lets each word…
A be ignored
B weigh every other word by relevance for context
C be sorted alphabetically
D become a number only
Question 13
When validating a time-series forecast you must…
A shuffle the data randomly
B split by time — train on the past, test on the future
C use the test set for tuning
D ignore seasonality
Question 14
Why save the whole Pipeline (not just the estimator) for deployment?
A It is smaller
B So serving uses the exact same preprocessing as training
C It trains faster
D It avoids Docker
Question 15
Why do deployed models need monitoring even if the code is unchanged?
A Python expires
B Data/concept drift erodes accuracy over time
C Disks fill up
D Models never change
Question 16
A disparate-impact ratio of 0.7 between groups should prompt you to…
A ignore it
B investigate the model for potential bias
C increase accuracy only
D delete the sensitive attribute and move on
Question 17
What does SHAP help you do?
A Clean data
B Explain how features drove a specific prediction
C Tune hyper-parameters
D Deploy a container
Question 18
The strongest evidence of your skills to an employer is…
A a long CV
B a public portfolio of documented, end-to-end projects
C a certificate alone
D years studied