Exam Tips & Common Traps
Knowing the content is half the battle — the other half is exam technique. Here is the playbook for the MCQ + coding paper, plus the misconceptions that quietly cost marks.
On this page
A.1 Before the exam
- Sleep beats cramming. A rested brain recalls and reasons far better than a tired one stuffed with last-minute facts.
- Final-hour routine: skim the Cheat Sheet once, then do one shuffle of Flashcards — only the cards you flagged "Review again".
- Warm up your hands: mentally write the universal sklearn pattern (
import → X, y → model → fit → predict) so it is automatic. - Arrive early with anything allowed (ID, stationery). Settle your breathing before the paper starts.
A.2 MCQ technique
- Eliminate first. Cross out the clearly-wrong options — even reducing 4 → 2 doubles your odds.
- Beware absolutes. Options with "always", "never", "only", "guarantees" are often wrong — real ML has trade-offs and exceptions.
- Watch "NOT / EXCEPT" questions. Re-read the stem; you are hunting the odd one out. Underline the word "NOT".
- Compute numerical MCQs — don't eyeball. For precision/recall, IQR, Gini, sigmoid, etc., do the arithmetic on paper. The distractors are the answers you get from the wrong formula.
- Trust your first instinct. Only change an answer if you find a concrete reason — second-guessing flips more right answers to wrong than the reverse.
- "All / none of the above" — if you can confirm two options are individually correct, "all of the above" is likely right.
- Never leave an MCQ blank (no negative marking assumed) — an eliminated-down guess still scores sometimes.
A.3 Coding-question technique
import lines and the right model class name score marks. Write the skeleton first, then fill the logic.
- Start with the skeleton. Almost every ML coding answer is the same 5 lines:
The universal pattern
from sklearn.<family> import <Model> X, y = features, labels model = <Model>() model.fit(X, y) model.predict(new_data)
- Pick the right class name — this is an easy mark to lose:
LinearRegressionvsLogisticRegression;DecisionTreeClassifiervsDecisionTreeRegressor. - Match the output layer to the task (neural nets): regression → linear, binary →
sigmoid, multi-class →softmax; loss →msevscrossentropy. - Use double brackets
df[['col']]when sklearn needs a 2-D feature input — a frequent silent error. - Comment your intent. A
# split into train/testcomment shows the examiner you knew the step, even if a parameter is slightly off. - Predicting the output of given code? Trace it line by line on paper — track each variable's value. Don't guess.
- Forgot exact syntax? Write the logically-correct version anyway (right function, right order). Correct logic with a tiny syntax slip beats a blank.
A.4 Time & nerves management
- Budget roughly 1 minute per MCQ; hand the saved time to the coding section.
- Bank easy marks first. Do one fast pass answering everything you know instantly; flag the hard ones and return.
- Don't sink 10 minutes into one MCQ. Make your best eliminated guess, flag it, move on.
- Tackle the longest coding question second-to-last, not last — leave the very end for a 5-minute review.
- Review pass: check you answered every question, your MCQ letters match your intended choices, and your code has its
importlines. - If your mind blanks, breathe out slowly, move to an easy question to rebuild momentum, then come back.
B Common Traps & Misconceptions
These are the misunderstandings that examiners love to test. For each: the Trap students fall for, and the Truth.
"Logistic Regression is a regression algorithm — it has 'regression' in the name."
It is a classification algorithm. It predicts the probability of a class and applies a threshold to decide the label.
"Inference happens during training."
Inference happens after training — it is the phase where the finished model generates output from a prompt.
"One token always equals one word."
A token is a sub-word chunk. A single word like "microtransactional" can be split into several tokens.
"A model with 99% accuracy is excellent."
On imbalanced data a lazy model that always predicts the majority class scores 99% yet is useless. Check precision & recall.
"Fine-tuning fixes hallucinations."
It does not reliably — and can worsen them if the data has errors. RAG is better for factuality because it grounds answers in retrieved text.
"Always remove stop words to clean text."
Removing "not" flips meaning ("not good" → "good"). Simple models remove stop words; RNNs/Transformers keep them.
"A higher R² always means a better model."
Plain R² always rises when you add features — even useless random ones. Use Adjusted R² to compare models fairly.
"Fill missing numeric values with the mean."
The mean is dragged by outliers. For skewed data or data with outliers, the median is the safe choice.
"Encode any text column as 0, 1, 2, 3…"
For nominal data (Red/Green/Blue) that invents a fake ranking. Use one-hot encoding; label-encode only ordinal data.
"A biased AI is biased because it is malicious or evil."
Bias ≠ malice. The model simply repeats patterns in its historical training data — bias is a data problem, not intent.
"ollama pull downloads a model and starts the chat."
ollama pull only downloads. ollama run downloads if needed and starts the chat session.
"A single perceptron can learn any pattern."
A single perceptron is a linear classifier — it cannot solve XOR (not linearly separable). Hidden layers are required.
"The LLM remembers the whole conversation by itself."
LLMs are stateless. The application re-sends the full conversation history with every request to simulate memory.
"100% training accuracy means the model is excellent."
If test accuracy is much lower, that gap is overfitting (high variance) — the model memorised noise instead of learning the pattern.
"To give a model your latest company data, fine-tune it."
Use RAG — it updates instantly by editing the database and can cite sources. Fine-tuning needs slow, costly retraining.
"Use sigmoid activation for the hidden layers."
Use ReLU for hidden layers. Sigmoid in deep hidden layers causes the vanishing gradient problem. Sigmoid belongs on a binary output.
"A Gini index of 1 means a perfectly pure node."
Gini = 0 is perfectly pure. For a binary node the maximum impurity is 0.5 (a 50/50 mix) — Gini never reaches 1.
"Backpropagation updates the network's weights."
Backpropagation computes the gradients (using the chain rule). Gradient descent then uses those gradients to update the weights.