≡ EXAM TOOLKIT

Exam Cheat Sheet

The whole syllabus, condensed. Every formula, key fact and decision rule across every topic on one page — built for the final hour before the exam.

All 18 topics ⏱ ~10 min skim 🖨 Print-friendly

💡 How to use this sheet Skim it the night before and again 1 hour before the exam. If a line does not make sense, open that topic and re-read it. To print, use your browser's Print (Ctrl/Cmd + P) — the layout is print-optimised.

★ Golden Decision Rules

Encoding: ordinal → Label, nominal → One-Hot
Impute: skewed/outliers → median, symmetric → mean, categorical → mode
Metric: spam → Precision, cancer/fraud → Recall, imbalanced → F1
Activation: hidden → ReLU, binary out → Sigmoid, multi-class out → Softmax, regression out → linear
Knowledge: new/changing facts → RAG, behaviour/style → Fine-tune, most tasks → Prompting
Prompt: reasoning → Chain-of-Thought, factual → low temperature
Overfitting fix: more data · simpler model · regularisation · early stopping

∑ Formula Bank

Linear Reg: y = β₀ + β₁x + ε OLS slope: m = Σ(x−x̄)(y−ȳ) / Σ(x−x̄)² Sigmoid: σ(z) = 1 / (1 + e⁻ᶻ) Z-score: Z = (x − μ) / σ IQR fences: Q1−1.5·IQR , Q3+1.5·IQR Min-Max: x' = (x−min)/(max−min) Precision = TP/(TP+FP) Recall = TP/(TP+FN) F1 = 2·P·R / (P+R) MAE=mean|y−ŷ| MSE=mean(y−ŷ)² RMSE=√MSE R² = 1 − RSS/TSS Entropy = −Σ pᵢ log₂ pᵢ Gini = 1 − Σ pᵢ² Neuron: z = Σ(wᵢxᵢ) + b Weight update: w = w − η·(∂L/∂w) Attention: softmax(Q·Kᵀ / √dₖ) · V

01 Introduction to ML

Traditional programming: rules + data → answers. ML: data + answers → rules (model).
Nesting: AI ⊃ ML ⊃ DL. Deep Learning auto-extracts features from raw data; classic ML needs hand-fed features.
Learning types: Supervised (labelled), Unsupervised (no labels), Semi-supervised (few labels + many unlabelled), Reinforcement (rewards/penalties).
Tasks: Classification (discrete), Regression (continuous), Clustering (unsupervised grouping).

02 Data Preprocessing

Missingness: MCAR (no pattern), MAR (depends on other data), MNAR (depends on missing value itself).
Imputation: mean (symmetric), median (skewed/outliers), mode (categorical).
Outliers: Z-score |Z| > 3; IQR = Q3−Q1, fences Q1−1.5·IQR / Q3+1.5·IQR.
Scaling: Min-Max → [0,1]; Standard (Z-score) → mean 0, std 1.
Encoding: ordinal → Label; nominal → One-Hot. drop_first=True avoids the dummy-variable trap.

03 Linear Regression

y = β₀ + β₁x + ε (OLS minimises Σ(y−ŷ)²)

OLS squares errors so + / − errors don't cancel and big errors are punished.
7 assumptions: Linearity, Independence of errors, Homoscedasticity (constant variance), Zero-mean errors, No multicollinearity, Exogeneity, Normality of errors.
Limits: linear-only, very outlier-sensitive, multicollinearity, omitted-variable bias.
sklearn: model.coef_ = slope(s), model.intercept_ = β₀.

04 Logistic Regression

It is classification, not regression — predicts a probability.
Sigmoid σ(z)=1/(1+e⁻ᶻ): range (0,1), σ(0)=0.5.
P = σ(mx+c). Default threshold 0.5; lower it when false negatives are dangerous (e.g. cancer).
Models the log-odds; draws a straight-line decision boundary only.

05 Model Evaluation

Confusion matrix: TP, TN, FP=Type I, FN=Type II.
Precision=TP/(TP+FP) → minimise FP. Recall=TP/(TP+FN) → minimise FN. F1=harmonic mean.
Regression: MAE (robust), MSE (punishes big errors), RMSE=√MSE, R²=1−RSS/TSS. Adjusted R² penalises useless features.
Underfit = high bias; Overfit = high variance (great train, poor test).
K-fold CV: model trained & tested K times.

06 Decision Trees

Root → internal nodes → leaves. Splits chosen to maximise purity.
Entropy = −Σp·log₂p (0→1). Information Gain = entropy reduction → pick the highest.
Gini = 1−Σp² (0→0.5, sklearn default). Lower weighted Gini = better split.
Regression tree → leaf outputs the mean.
Overfit easily → control with max_depth, min_samples_leaf/split.

07 Intro to Neural Networks

z = Σ(wᵢxᵢ) + b → activation

Perceptron = linear classifier; cannot solve XOR (not linearly separable).
Bias shifts the boundary off the origin.
Gradient descent: w = w − (learning rate × gradient). LR=0 → no learning.
Backpropagation = chain rule, distributes "blame" output → input.
Epoch = full data pass; iteration = one weight update.

08 Types of Neural Networks

MLP = input + hidden layer(s) + output. Hidden layers + non-linear activations → non-linearity (Universal Approximation Theorem).
Activations: ReLU hidden, Sigmoid binary out, Softmax multi-class out, linear regression out.
Loss: MSE (regression), Cross-Entropy (classification). Optimizer default = Adam.
RNN keeps a hidden state for sequences; LSTM/GRU fix long memory. Vanishing gradient → use ReLU.

09 Natural Language Processing

Pipeline: clean → tokenize → stop words → stem/lemmatize → vectorize → pad.
Stemming = crude chop (may give non-words); Lemmatization = dictionary (accurate).
One-hot = huge & sparse; embeddings = dense & capture meaning.
Padding gives fixed-length inputs (pre / post).
Sequential models exploit word order → next-word prediction.

10 LLM Architecture

Attention = softmax(Q·Kᵀ / √dₖ) · V

LLM = next-token predictor with billions of parameters.
Transformer (2017): processes all words in parallel. Encoder understands, Decoder generates.
Q = what I seek, K = what I offer, V = the information.
Positional encoding gives word order; masked attention stops the decoder peeking ahead.
Hallucination = confident falsehood; bias ≠ malice. Responsible AI: Fairness, Transparency, Accountability, Safety & Privacy.

11 Generative AI Modalities

Generative creates new content; discriminative labels existing data.
Text/code → Transformers; images → Diffusion (noise → image); audio → TTS, voice clone, music.
Multimodal = many data types at once (e.g. GPT-4V).
Inference happens after training. A token is not always one word.

12 GenAI Commercial APIs

Hosted models via HTTP/JSON, pay-as-you-go.
API key authenticates & tracks usage → store in env vars / secrets manager; never hard-code or expose in frontend.
Key lifecycle: create → configure → use → rotate → revoke.
Token pricing = input + output tokens.
HTTP 429 = rate limit → exponential backoff. 401 = auth, 403 = access.

13 Prompt Engineering

Prompt anatomy: Instruction + Context + Input + Output format.
Zero-shot (no examples), Few-shot (examples = in-context learning), Chain-of-Thought ("think step by step", for reasoning), Role prompting (persona).
Low temperature = precise/deterministic; high = creative.
Prompt injection → defend with delimiters. Prompting is an iterative loop.

14 Fine-Tuning

Spectrum: Prompting (no change) → RAG (no change, knowledge) → Fine-tuning (changes weights, behaviour).
Fine-tuning teaches behaviour/style, not facts; RAG beats it for factual accuracy & updates.
Instruction-tuning = (instruction, output) pairs. LoRA/PEFT trains a tiny adapter (~1% of params).
Strategy: prompt → RAG → fine-tune last. Quality > quantity.

15 Managing State in Chatbots

LLMs are stateless — they forget after each call.
"Memory" = the app re-sends the full conversation history (roles: system, user, assistant).
Strategies: full history, window/buffer, summary, vector retrieval.
Context window = token limit per request. State types: ephemeral, persistent, shared. Checkpointing = resume after restart.

16 RAG

RAG = Retrieve → Augment → Generate (the "open-book exam").
Pipeline: Query → Embeddings → Retrieve → Rank → Generate.
Components: External data, Retriever, Ranker, Generator.
Cosine similarity: 1 = identical, 0 = unrelated.
Vector DBs: Chroma (local), FAISS (fast search). Beats fine-tuning for changing/factual knowledge.

17 Rapid Prototyping Tools

Turn Python logic into a web app in hours — no HTML/CSS/JS.
Streamlit: full apps, run streamlit run app.py, auto-reactive on every interaction.
Gradio: wraps a single function — gr.Interface(fn, inputs, outputs) + .launch(); ML-native, ideal for quick model demos.

18 Agentic AI — Components

Agent = Brain (LLM) + Hands (Tools) + Instructions (Prompts).
Naked LLM fails: training cutoff, hallucinations, no actions, poor maths → Tools fix it (Function Calling).
Loop: Request → Reason → Select tool → Execute → Synthesise.
Memory: short-term (sticky note) vs long-term (filing cabinet). Planning = goal decomposition.
Closed: GPT/Gemini/Claude. Open: LLaMA/Gemma/Mistral. Ollama runs local models.

19 Agentic AI — Control Flow

ReAct = Think → Act → Observe (a loop → enables self-correction).
State = the agent's memory; also validates inputs.
Control flow: branching, loops, conditional routing, HITL (pause for human approval).
LangGraph: State (typed dict), Nodes (worker functions), Edges (standard/conditional), StateGraph → .compile(). add_messages appends history.

20 Low-Code Automation (n8n)

Build by drag-and-drop nodes. n8n = free, open-source, self-hostable.
Node = a function; workflow = a script; arrows pass JSON, read via {{ $json.field }}.
Automation core: Trigger → Logic → Action. Triggers: Manual, Schedule (cron), Webhook, App.
Credentials = encrypted, stored separately, referenced by nodes.
n8n replicates LangChain: AI Agent (brain), tool nodes, Window Buffer Memory, IF node (routing).

🎯 If you remember nothing else ML = learn rules from data. Classification vs Regression vs Clustering. Sigmoid → classification. Precision vs Recall vs F1. Bias vs Variance. Gini/Entropy split trees. Backprop = chain rule. Transformer = parallel + self-attention (Q,K,V). Prompting → RAG → Fine-tuning. RAG = Retrieve-Augment-Generate. Agent = Brain + Tools + Instructions. ReAct = Think-Act-Observe.