Natural Language Processing — Quiz

Answer all 12 questions, then submit. You need 70% to pass. Log in to save progress.

Question 1
What does lemmatisation do?
A Deletes all verbs
B Reduces words to a base form (running → run)
C Counts words
D Translates text
Question 2
What are stopwords?
A Misspelled words
B Very common words (the, is, of) often removed for classic models
C Foreign words
D The last words in a document
Question 3
What does TF-IDF do that plain word counts don't?
A Keeps word order
B Down-weights words common across all documents and up-weights distinctive ones
C Translates the text
D Removes punctuation
Question 4
What key information does Bag-of-Words / TF-IDF discard?
A The vocabulary
B Word order
C The number of documents
D The labels
Question 5
A TF-IDF + logistic-regression text classifier is valued because it is…
A the most accurate model ever
B fast, strong and interpretable (you can see which words drive it)
C able to keep word order
D free of any preprocessing
Question 6
What do word embeddings represent?
A Word counts
B Words as dense vectors where similar meanings are close together
C File sizes
D Document labels
Question 7
What does king - man + woman ≈ queen demonstrate?
A A bug
B That relationships are encoded as directions in embedding space
C Random noise
D TF-IDF weighting
Question 8
What is a limitation of classic (word2vec/GloVe) embeddings?
A They are too fast
B Each word gets one fixed vector regardless of context
C They need labels
D They cannot be downloaded
Question 9
What is the key idea of attention in a transformer?
A Ignoring most words
B Each word weighs every other word by relevance for context
C Sorting words alphabetically
D Removing stopwords
Question 10
Why were transformers a breakthrough over older sequence models?
A They use no data
B They are context-aware, process sequences in parallel, and transfer via pretraining
C They are smaller
D They avoid maths
Question 11
What does the Hugging Face pipeline API handle for you?
A Only plotting
B Tokenising, running the pretrained model, and decoding the output
C Database access
D Web hosting
Question 12
What is the recommended modern NLP workflow?
A Always train from scratch
B Start with a pretrained model/pipeline, evaluate, fine-tune only if needed
C Avoid pretrained models
D Use only TF-IDF