Exam Cheat Sheet
The whole syllabus, condensed. Every formula, key fact and decision rule for all 20 lectures on one page — built for the final hour before the exam.
💡
How to use this sheet
Skim it the night before and again 1 hour before the exam. If a line does not make sense, open that lecture and re-read it. To print, use your browser's Print (Ctrl/Cmd + P) — the layout is print-optimised.
★ Golden Decision Rules
- Encoding: ordinal → Label, nominal → One-Hot
- Impute: skewed/outliers → median, symmetric → mean, categorical → mode
- Metric: spam → Precision, cancer/fraud → Recall, imbalanced → F1
- Activation: hidden → ReLU, binary out → Sigmoid, multi-class out → Softmax, regression out → linear
- Knowledge: new/changing facts → RAG, behaviour/style → Fine-tune, most tasks → Prompting
- Prompt: reasoning → Chain-of-Thought, factual → low temperature
- Overfitting fix: more data · simpler model · regularisation · early stopping
∑ Formula Bank
Linear Reg: y = β₀ + β₁x + ε
OLS slope: m = Σ(x−x̄)(y−ȳ) / Σ(x−x̄)²
Sigmoid: σ(z) = 1 / (1 + e⁻ᶻ)
Z-score: Z = (x − μ) / σ
IQR fences: Q1−1.5·IQR , Q3+1.5·IQR
Min-Max: x' = (x−min)/(max−min)
Precision = TP/(TP+FP) Recall = TP/(TP+FN)
F1 = 2·P·R / (P+R)
MAE=mean|y−ŷ| MSE=mean(y−ŷ)² RMSE=√MSE
R² = 1 − RSS/TSS
Entropy = −Σ pᵢ log₂ pᵢ Gini = 1 − Σ pᵢ²
Neuron: z = Σ(wᵢxᵢ) + b
Weight update: w = w − η·(∂L/∂w)
Attention: softmax(Q·Kᵀ / √dₖ) · V
L1 Introduction to ML
- Traditional programming: rules + data → answers. ML: data + answers → rules (model).
- Nesting: AI ⊃ ML ⊃ DL. Deep Learning auto-extracts features from raw data; classic ML needs hand-fed features.
- Learning types: Supervised (labelled), Unsupervised (no labels), Semi-supervised (few labels + many unlabelled), Reinforcement (rewards/penalties).
- Tasks: Classification (discrete), Regression (continuous), Clustering (unsupervised grouping).
L2 Data Preprocessing
- Missingness: MCAR (no pattern), MAR (depends on other data), MNAR (depends on missing value itself).
- Imputation: mean (symmetric), median (skewed/outliers), mode (categorical).
- Outliers: Z-score |Z| > 3; IQR = Q3−Q1, fences Q1−1.5·IQR / Q3+1.5·IQR.
- Scaling: Min-Max → [0,1]; Standard (Z-score) → mean 0, std 1.
- Encoding: ordinal → Label; nominal → One-Hot.
drop_first=Trueavoids the dummy-variable trap.
L3 Linear Regression
y = β₀ + β₁x + ε (OLS minimises Σ(y−ŷ)²)
- OLS squares errors so + / − errors don't cancel and big errors are punished.
- 7 assumptions: Linearity, Independence of errors, Homoscedasticity (constant variance), Zero-mean errors, No multicollinearity, Exogeneity, Normality of errors.
- Limits: linear-only, very outlier-sensitive, multicollinearity, omitted-variable bias.
- sklearn:
model.coef_= slope(s),model.intercept_= β₀.
L4 Logistic Regression
- It is classification, not regression — predicts a probability.
- Sigmoid σ(z)=1/(1+e⁻ᶻ): range (0,1), σ(0)=0.5.
- P = σ(mx+c). Default threshold 0.5; lower it when false negatives are dangerous (e.g. cancer).
- Models the log-odds; draws a straight-line decision boundary only.
L5 Model Evaluation
- Confusion matrix: TP, TN, FP=Type I, FN=Type II.
- Precision=TP/(TP+FP) → minimise FP. Recall=TP/(TP+FN) → minimise FN. F1=harmonic mean.
- Regression: MAE (robust), MSE (punishes big errors), RMSE=√MSE, R²=1−RSS/TSS. Adjusted R² penalises useless features.
- Underfit = high bias; Overfit = high variance (great train, poor test).
- K-fold CV: model trained & tested K times.
L6 Decision Trees
- Root → internal nodes → leaves. Splits chosen to maximise purity.
- Entropy = −Σp·log₂p (0→1). Information Gain = entropy reduction → pick the highest.
- Gini = 1−Σp² (0→0.5, sklearn default). Lower weighted Gini = better split.
- Regression tree → leaf outputs the mean.
- Overfit easily → control with
max_depth,min_samples_leaf/split.
L7 Intro to Neural Networks
z = Σ(wᵢxᵢ) + b → activation
- Perceptron = linear classifier; cannot solve XOR (not linearly separable).
- Bias shifts the boundary off the origin.
- Gradient descent: w = w − (learning rate × gradient). LR=0 → no learning.
- Backpropagation = chain rule, distributes "blame" output → input.
- Epoch = full data pass; iteration = one weight update.
L8 Types of Neural Networks
- MLP = input + hidden layer(s) + output. Hidden layers + non-linear activations → non-linearity (Universal Approximation Theorem).
- Activations: ReLU hidden, Sigmoid binary out, Softmax multi-class out, linear regression out.
- Loss: MSE (regression), Cross-Entropy (classification). Optimizer default = Adam.
- RNN keeps a hidden state for sequences; LSTM/GRU fix long memory. Vanishing gradient → use ReLU.
L9 Natural Language Processing
- Pipeline: clean → tokenize → stop words → stem/lemmatize → vectorize → pad.
- Stemming = crude chop (may give non-words); Lemmatization = dictionary (accurate).
- One-hot = huge & sparse; embeddings = dense & capture meaning.
- Padding gives fixed-length inputs (
pre/post). - Sequential models exploit word order → next-word prediction.
L10 LLM Architecture
Attention = softmax(Q·Kᵀ / √dₖ) · V
- LLM = next-token predictor with billions of parameters.
- Transformer (2017): processes all words in parallel. Encoder understands, Decoder generates.
- Q = what I seek, K = what I offer, V = the information.
- Positional encoding gives word order; masked attention stops the decoder peeking ahead.
- Hallucination = confident falsehood; bias ≠ malice. Responsible AI: Fairness, Transparency, Accountability, Safety & Privacy.
L11 Generative AI Modalities
- Generative creates new content; discriminative labels existing data.
- Text/code → Transformers; images → Diffusion (noise → image); audio → TTS, voice clone, music.
- Multimodal = many data types at once (e.g. GPT-4V).
- Inference happens after training. A token is not always one word.
L12 GenAI Commercial APIs
- Hosted models via HTTP/JSON, pay-as-you-go.
- API key authenticates & tracks usage → store in env vars / secrets manager; never hard-code or expose in frontend.
- Key lifecycle: create → configure → use → rotate → revoke.
- Token pricing = input + output tokens.
- HTTP 429 = rate limit → exponential backoff. 401 = auth, 403 = access.
L13 Prompt Engineering
- Prompt anatomy: Instruction + Context + Input + Output format.
- Zero-shot (no examples), Few-shot (examples = in-context learning), Chain-of-Thought ("think step by step", for reasoning), Role prompting (persona).
- Low temperature = precise/deterministic; high = creative.
- Prompt injection → defend with delimiters. Prompting is an iterative loop.
L14 Fine-Tuning
- Spectrum: Prompting (no change) → RAG (no change, knowledge) → Fine-tuning (changes weights, behaviour).
- Fine-tuning teaches behaviour/style, not facts; RAG beats it for factual accuracy & updates.
- Instruction-tuning = (instruction, output) pairs. LoRA/PEFT trains a tiny adapter (~1% of params).
- Strategy: prompt → RAG → fine-tune last. Quality > quantity.
L15 Managing State in Chatbots
- LLMs are stateless — they forget after each call.
- "Memory" = the app re-sends the full conversation history (roles:
system,user,assistant). - Strategies: full history, window/buffer, summary, vector retrieval.
- Context window = token limit per request. State types: ephemeral, persistent, shared. Checkpointing = resume after restart.
L16 RAG
- RAG = Retrieve → Augment → Generate (the "open-book exam").
- Pipeline: Query → Embeddings → Retrieve → Rank → Generate.
- Components: External data, Retriever, Ranker, Generator.
- Cosine similarity: 1 = identical, 0 = unrelated.
- Vector DBs: Chroma (local), FAISS (fast search). Beats fine-tuning for changing/factual knowledge.
L17 Rapid Prototyping Tools
- Turn Python logic into a web app in hours — no HTML/CSS/JS.
- Streamlit: full apps, run
streamlit run app.py, auto-reactive on every interaction. - Gradio: wraps a single function —
gr.Interface(fn, inputs, outputs)+.launch(); ML-native, ideal for quick model demos.
L18 Agentic AI — Components
- Agent = Brain (LLM) + Hands (Tools) + Instructions (Prompts).
- Naked LLM fails: training cutoff, hallucinations, no actions, poor maths → Tools fix it (Function Calling).
- Loop: Request → Reason → Select tool → Execute → Synthesise.
- Memory: short-term (sticky note) vs long-term (filing cabinet). Planning = goal decomposition.
- Closed: GPT/Gemini/Claude. Open: LLaMA/Gemma/Mistral. Ollama runs local models.
L19 Agentic AI — Control Flow
- ReAct = Think → Act → Observe (a loop → enables self-correction).
- State = the agent's memory; also validates inputs.
- Control flow: branching, loops, conditional routing, HITL (pause for human approval).
- LangGraph: State (typed dict), Nodes (worker functions), Edges (standard/conditional), StateGraph →
.compile().add_messagesappends history.
L20 Low-Code Automation (n8n)
- Build by drag-and-drop nodes. n8n = free, open-source, self-hostable.
- Node = a function; workflow = a script; arrows pass JSON, read via
{{ $json.field }}. - Automation core: Trigger → Logic → Action. Triggers: Manual, Schedule (cron), Webhook, App.
- Credentials = encrypted, stored separately, referenced by nodes.
- n8n replicates LangChain: AI Agent (brain), tool nodes, Window Buffer Memory, IF node (routing).
🎯
If you remember nothing else
ML = learn rules from data. Classification vs Regression vs Clustering. Sigmoid → classification. Precision vs Recall vs F1. Bias vs Variance. Gini/Entropy split trees. Backprop = chain rule. Transformer = parallel + self-attention (Q,K,V). Prompting → RAG → Fine-tuning. RAG = Retrieve-Augment-Generate. Agent = Brain + Tools + Instructions. ReAct = Think-Act-Observe.