Capstone Project & Professional Assessment — Quiz | Data Science using Python

Question 1

A strong end-to-end data-science project starts and ends with…

A a dataset

B a clear question (with a success metric) and an actionable recommendation

C a neural network

D a chart

Question 2

In Module 1's terms, a model's prediction from features and weights is computed with a…

A sort

B dot product

C histogram

D merge

Question 3

Which preserves a cleaned DataFrame's dtypes for later reuse?

A CSV

B Parquet

C plain text

D a screenshot

Question 4

A column has skew of +1.6. The mean will tend to be…

A below the median

B above the median

C equal to the median

D undefined

Question 5

A 95% confidence interval means…

A 95% chance the true value is in this interval

B ≈95% of such intervals would contain the true value if sampling were repeated

C the data is 95% clean

D the model is 95% accurate

Question 6

To get an honest estimate of model performance you must…

A evaluate on the training data

B evaluate on a held-out test set never seen in training

C use all data for training

D skip validation

Question 7

On a dataset where only 2% are positive, the misleading metric is…

A recall

B accuracy

C precision

D F1

Question 8

What does a scikit-learn Pipeline prevent?

A Fast training

B Data leakage from preprocessing into validation

C Saving the model

D Using cross-validation

Question 9

For typical tabular data, a strong default model family is…

A a plain dense neural network

B gradient-boosted trees (XGBoost/LightGBM)

C k-means

D PCA

Question 10

Before K-means clustering you should always…

A shuffle the rows

B scale the features

C remove the labels' names

D one-hot encode the target

Question 11

What is the key idea of transfer learning?

A Train from scratch every time

B Reuse a model pretrained on large data and adapt it with little data

C Move files to the cloud

D Delete the pretrained weights

Question 12

In a transformer, attention lets each word…

A be ignored

B weigh every other word by relevance for context

C be sorted alphabetically

D become a number only

Question 13

When validating a time-series forecast you must…

A shuffle the data randomly

B split by time — train on the past, test on the future

C use the test set for tuning

D ignore seasonality

Question 14

Why save the whole Pipeline (not just the estimator) for deployment?

A It is smaller

B So serving uses the exact same preprocessing as training

C It trains faster

D It avoids Docker

Question 15

Why do deployed models need monitoring even if the code is unchanged?

A Python expires

B Data/concept drift erodes accuracy over time

C Disks fill up

D Models never change

Question 16

A disparate-impact ratio of 0.7 between groups should prompt you to…

A ignore it

B investigate the model for potential bias

C increase accuracy only

D delete the sensitive attribute and move on

Question 17

What does SHAP help you do?

A Clean data

B Explain how features drove a specific prediction

C Tune hyper-parameters

D Deploy a container

Question 18

The strongest evidence of your skills to an employer is…

A a long CV

B a public portfolio of documented, end-to-end projects

C a certificate alone

D years studied