Natural Language Processing — Quiz | Data Science using Python

Question 1

What does lemmatisation do?

A Deletes all verbs

B Reduces words to a base form (running → run)

C Counts words

D Translates text

Question 2

What are stopwords?

A Misspelled words

B Very common words (the, is, of) often removed for classic models

C Foreign words

D The last words in a document

Question 3

What does TF-IDF do that plain word counts don't?

A Keeps word order

B Down-weights words common across all documents and up-weights distinctive ones

C Translates the text

D Removes punctuation

Question 4

What key information does Bag-of-Words / TF-IDF discard?

A The vocabulary

B Word order

C The number of documents

D The labels

Question 5

A TF-IDF + logistic-regression text classifier is valued because it is…

A the most accurate model ever

B fast, strong and interpretable (you can see which words drive it)

C able to keep word order

D free of any preprocessing

Question 6

What do word embeddings represent?

A Word counts

B Words as dense vectors where similar meanings are close together

C File sizes

D Document labels

Question 7

What does king - man + woman ≈ queen demonstrate?

A A bug

B That relationships are encoded as directions in embedding space

C Random noise

D TF-IDF weighting

Question 8

What is a limitation of classic (word2vec/GloVe) embeddings?

A They are too fast

B Each word gets one fixed vector regardless of context

C They need labels

D They cannot be downloaded

Question 9

What is the key idea of attention in a transformer?

A Ignoring most words

B Each word weighs every other word by relevance for context

C Sorting words alphabetically

D Removing stopwords

Question 10

Why were transformers a breakthrough over older sequence models?

A They use no data

B They are context-aware, process sequences in parallel, and transfer via pretraining

C They are smaller

D They avoid maths

Question 11

What does the Hugging Face pipeline API handle for you?

A Only plotting

B Tokenising, running the pretrained model, and decoding the output

C Database access

D Web hosting

Question 12

What is the recommended modern NLP workflow?

A Always train from scratch

B Start with a pretrained model/pipeline, evaluate, fine-tune only if needed

C Avoid pretrained models

D Use only TF-IDF