Full Mock Exam
A full-length practice exam covering all 20 lectures โ 40 MCQs (auto-graded) and 8 coding questions. Simulate the real test, then check every solution.
One mark each. Choose the single best answer.
In machine learning, the computer's job during training is to produce the:
Traditional programming: human writes rules. ML: machine discovers the rules (the model) from data + answers.
Grouping customers into segments with no predefined labels is:
No labels + forming natural groups = clustering, an unsupervised task.
For a skewed income column containing outliers, the safest way to fill missing values is:
The median ignores extreme values, so it is robust for skewed data with outliers. The mean gets dragged by outliers.
The colour column (Red, Green, Blue) should be converted to numbers using:
Colour is nominal (no order). One-hot encoding avoids inventing a fake ranking, unlike label encoding.
With Q1 = 30 and Q3 = 70, the IQR upper bound for outlier detection is:
IQR = 70 โ 30 = 40. Upper bound = Q3 + 1.5รIQR = 70 + 60 = 130.
Ordinary Least Squares finds the line that minimises:
OLS squares errors so positive and negative residuals do not cancel, then minimises their sum.
"Error variance must stay constant across all values of x" is the assumption of:
Homoscedasticity = constant error variance. If errors fan out, it is violated (heteroscedasticity).
The sigmoid function ฯ(0) equals:
ฯ(0) = 1/(1+eโฐ) = 1/2 = 0.5 โ the point of maximum uncertainty.
Logistic Regression is fundamentally used for:
Despite its name, Logistic Regression is a classification algorithm โ it predicts the probability of a class.
A confusion matrix gives TP=30, FP=10, FN=10, TN=50. The precision is:
Precision = TP/(TP+FP) = 30/(30+10) = 30/40 = 0.75.
A model with 99% training accuracy and 62% test accuracy is suffering from:
A large train-test gap is the signature of overfitting / high variance โ the model memorised noise.
For cancer screening, where missing a sick patient is dangerous, the priority metric is:
Recall = TP/(TP+FN) minimises false negatives โ critical when a missed positive case is costly.
A decision-tree node with 4 "Yes" and 4 "No" samples has a Gini index of:
Gini = 1 โ (0.5ยฒ + 0.5ยฒ) = 1 โ 0.5 = 0.5 โ maximum impurity for a binary node.
A decision tree chooses its splits to maximise:
The tree greedily picks the split that most reduces impurity โ i.e. the highest Information Gain / lowest weighted Gini.
A single-layer perceptron cannot solve the XOR problem because XOR is:
A single perceptron is a linear classifier; no single line separates XOR's classes. Hidden layers are needed.
Backpropagation computes weight gradients using:
Backpropagation applies the chain rule layer-by-layer, from output to input, to find each weight's gradient.
For the output layer of a 10-class classifier, the correct activation is:
Softmax outputs probabilities across all classes that sum to 1 โ ideal for multi-class output.
What gives an RNN the ability to handle sequential data?
The hidden state is the RNN's memory โ it passes context from earlier steps forward.
Which technique reduces a word to its dictionary root, always producing a real word?
Lemmatization uses a dictionary ("studies" โ "study"). Stemming crudely chops endings and can produce non-words.
Compared with one-hot encoding, word embeddings are:
Embeddings are compact, dense vectors where similar words sit close together โ one-hot vectors are huge, sparse and "dumb".
In the self-attention formula, the Query (Q) vector represents:
Query = what I need; Key = what I offer; Value = the actual information passed along.
An LLM confidently producing made-up facts is called:
A hallucination is a fluent but factually false output โ the model predicts what sounds right, not what is right.
Why does a Transformer need Positional Encoding?
Parallel processing means no built-in order; positional encoding tags each word with its position.
Image generators like Stable Diffusion are based on:
Diffusion models start from random noise and iteratively denoise it into an image guided by the prompt.
Which is a generative model?
GPT generates new content (generative). Random Forest, SVM and Logistic Regression are discriminative classifiers.
The HTTP status code 429 from an AI API means:
429 = Too Many Requests. Handle it with exponential backoff. 401 = auth issue, 403 = access restricted.
An API key should be stored:
Keys belong in env vars / secrets managers, used server-side only โ never exposed in frontend or public code.
Adding the phrase "Let's think step by step" to a prompt is an example of:
That magic phrase triggers Chain-of-Thought โ step-by-step reasoning that boosts accuracy on logic/maths.
For deterministic, precise code generation, the temperature should be:
Low temperature โ precise, reproducible output (coding, maths). High temperature โ creative, varied output.
To give a chatbot access to constantly-changing company data, the best approach is:
RAG can be updated instantly by editing the database; fine-tuning needs slow retraining for new data.
LoRA makes fine-tuning efficient by:
LoRA (a PEFT method) trains only ~1% of parameters in an adapter layer โ fast, cheap, modular.
A chatbot appears to remember earlier turns because the application:
LLMs are stateless; "memory" is the app passing the whole message history each time.
Keeping only the last N conversation turns to bound token cost is called:
Window/buffer memory keeps a sliding window of recent turns โ cheap and bounded.
The correct order of the RAG pipeline is:
Retrieve relevant docs, augment the prompt with them, then generate the grounded answer.
Which of these is a vector database used in RAG?
FAISS (and Chroma, Pinecone, Weaviate) store embeddings and do fast similarity search.
Gradio's core philosophy is to:
Gradio is function-first โ give it a function plus input/output types and it builds the UI. Streamlit builds full apps.
In the "digital intern" model of an agent, Tools play the role of the:
Brain = LLM, Hands = Tools, Instructions = Prompts. Tools let the agent act on the world.
Which Ollama command only downloads a model without starting a chat?
ollama pull downloads only; ollama run downloads (if needed) AND starts the chat.
The ReAct framework follows the cycle:
ReAct = Reason + Act: Think, Act, Observe, then think again โ enabling self-correction.
In n8n, the node that starts a workflow at a scheduled time is the:
A Schedule Trigger fires the workflow at set times โ the n8n equivalent of a cron job.
Write the code on paper, then press Show Solution (or Submit Exam) to compare. These are not auto-graded.
Given a DataFrame df, write code to: fill missing values in Age with the median, and one-hot encode the nominal column City.
import pandas as pd # Median imputation for Age df['Age'] = df['Age'].fillna(df['Age'].median()) # One-hot encode the nominal City column df = pd.get_dummies(df, columns=['City'], drop_first=True) print(df.head())
drop_first=True avoids the dummy variable trap.
Train a Linear Regression model on X = [[1],[2],[3],[4]], y = [10,20,30,40] and predict for x = 7.
from sklearn.linear_model import LinearRegression
X = [[1], [2], [3], [4]]
y = [10, 20, 30, 40] # rule: y = 10x
model = LinearRegression()
model.fit(X, y)
print("Prediction for x=7:", model.predict([[7]]))
Given y_true and y_pred, write code to print the confusion matrix, precision, recall and F1 score.
from sklearn.metrics import (confusion_matrix, precision_score,
recall_score, f1_score)
y_true = [1, 0, 1, 1, 0, 1]
y_pred = [1, 0, 0, 1, 0, 1]
print("Confusion matrix:\n", confusion_matrix(y_true, y_pred))
print("Precision:", precision_score(y_true, y_pred))
print("Recall :", recall_score(y_true, y_pred))
print("F1 Score :", round(f1_score(y_true, y_pred), 3))
Write a numpy sigmoid(z) function and use it to classify z = 1.5 (class 1 if output โฅ 0.5).
import numpy as np
def sigmoid(z):
return 1 / (1 + np.exp(-z))
p = sigmoid(1.5)
print("Probability:", round(p, 4))
print("Class:", 1 if p >= 0.5 else 0)
Build and compile a Keras MLP for a 3-class classification problem with input dimension 10 and one hidden layer of 16 ReLU neurons.
from keras import models, layers
from keras.layers import Input
model = models.Sequential([
Input(shape=(10,)),
layers.Dense(16, activation='relu'), # hidden layer
layers.Dense(3, activation='softmax') # 3-class output
])
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])3 classes โ 3 output neurons + softmax + categorical_crossentropy.
Write code to lowercase a sentence, remove punctuation, and split it into tokens.
import string
text = "Hello, NLP World! Let's tokenize THIS."
text = text.lower()
text = text.translate(str.maketrans('', '', string.punctuation))
tokens = text.split()
print(tokens)
Using Chroma, create a collection, add two documents, and query for the most relevant one.
import chromadb
client = chromadb.Client()
collection = client.create_collection("docs")
collection.add(
documents=["Our return policy lasts 30 days.",
"The warehouse is in Mumbai."],
ids=["d1", "d2"]
)
result = collection.query(
query_texts=["How long do I have to return an item?"],
n_results=1
)
print(result["documents"][0])
The query matched the return-policy document by meaning, not keywords.
Using LangChain, define a tool that adds two numbers and bind it to an LLM. Then build a minimal LangGraph State.
from langchain_core.tools import tool
from typing import TypedDict, Annotated
from langgraph.graph.message import add_messages
# A tool: a function with a docstring
@tool
def add(a: int, b: int) -> int:
"""Add two numbers together."""
return a + b
# Bind the tool to the LLM
llm_with_tools = llm.bind_tools([add])
# A minimal LangGraph state
class AgentState(TypedDict):
messages: Annotated[list, add_messages]The @tool decorator + docstring let the LLM know when to call add; the State stores the appended message history.