{ } EXAM TOOLKIT

Code Patterns Library

The coding section rewards muscle memory. Every reusable template from the syllabus is here — learn the shape of each one and you can rebuild it from memory under exam pressure.

13 pattern groups Quick-copy reference ⏱ ~12 min skim

💡 The one pattern that unlocks the rest Almost every scikit-learn answer is: import → prepare X & y → create model → .fit() → .predict(). Memorise that skeleton; then each model is just a different class name plugged into it.

1 · Universal ML pattern 2 · Preprocessing 3 · Regression & Classification 4 · Model Evaluation 5 · Decision Trees 6 · Neural Networks 7 · NLP 8 · Hugging Face 9 · Commercial APIs 10 · Prompt Engineering 11 · RAG 12 · Streamlit & Gradio 13 · AI Agents

1 The Universal ML Pattern

Every scikit-learn model has the same five-step shape. Swap the class name to change the model.

Python

from sklearn.linear_model import LinearRegression

X = [[1], [2], [3], [4]]      # features  (2-D)
y = [10, 20, 30, 40]          # labels

model = LinearRegression()    # 1. create
model.fit(X, y)               # 2. learn from data
print(model.predict([[5]]))   # 3. predict  ->  [50.]

2 Data Preprocessing

Missing values — count them, then impute with the median (robust to outliers).

Python

import pandas as pd
df = pd.read_csv('data.csv')

print(df.isnull().sum())                          # missing per column
df['Age'] = df['Age'].fillna(df['Age'].median())  # median imputation

Outlier removal with the IQR method.

Python

Q1 = df['Salary'].quantile(0.25)
Q3 = df['Salary'].quantile(0.75)
IQR = Q3 - Q1
low, high = Q1 - 1.5 * IQR, Q3 + 1.5 * IQR
df = df[(df['Salary'] >= low) & (df['Salary'] <= high)]

print("Keeping rows between", round(low, 1), "and", round(high, 1))
print("Rows kept:", len(df))

OutputKeeping rows between 28.5 and 74.5 Rows kept: 9

Scaling & encoding — Min-Max for numbers, Label for ordinal, One-Hot for nominal.

Python

from sklearn.preprocessing import MinMaxScaler, LabelEncoder
import pandas as pd

df['Income'] = MinMaxScaler().fit_transform(df[['Income']])  # -> [0,1]
df['Size']   = LabelEncoder().fit_transform(df['Size'])      # ordinal
df = pd.get_dummies(df, columns=['City'], drop_first=True)   # nominal

print(df)

Output Income Size City_Mumbai City_Pune 0 0.000000 2 False False 1 0.320755 1 True False 2 0.490566 0 False True 3 1.000000 1 False False 4 0.226415 2 True False

3 Regression & Classification

Linear Regression with a train/test split.

Python

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.8)
model = LinearRegression().fit(X_train, y_train)
print(model.coef_, model.intercept_)   # slope(s) and bias

Logistic Regression — note predict vs predict_proba.

Python

from sklearn.linear_model import LogisticRegression

model = LogisticRegression()
model.fit(X, y)
print(model.predict([[85]]))         # class label: [0] or [1]
print(model.predict_proba([[85]]))   # probabilities: [P(0), P(1)]

4 Model Evaluation

Classification metrics from a confusion matrix.

Python

from sklearn.metrics import (confusion_matrix, precision_score,
                             recall_score, f1_score, accuracy_score)

print(confusion_matrix(y_true, y_pred))
print("Precision:", precision_score(y_true, y_pred))
print("Recall   :", recall_score(y_true, y_pred))
print("F1       :", f1_score(y_true, y_pred))

Regression metrics and K-fold cross-validation.

Python

import numpy as np
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.model_selection import cross_val_score

mae  = mean_absolute_error(y_true, y_pred)
rmse = np.sqrt(mean_squared_error(y_true, y_pred))   # RMSE = sqrt(MSE)
r2   = r2_score(y_true, y_pred)
scores = cross_val_score(model, X, y, cv=5)          # 5-fold CV

print("MAE:", round(mae, 2), "  RMSE:", round(rmse, 2), "  R2:", round(r2, 3))
print("CV folds:", scores.round(3))

OutputMAE: 0.31 RMSE: 0.34 R2: 0.943 CV folds: [0.953 0.973 0.983 0.985 0.969]

5 Decision Trees

A constrained tree (to limit overfitting) and automated tuning with GridSearchCV.

Python

from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.model_selection import GridSearchCV

clf = DecisionTreeClassifier(max_depth=3, criterion='gini')
clf.fit(X_train, y_train)

grid = GridSearchCV(DecisionTreeClassifier(),
                    {'max_depth': [2, 3, 4, 5]}, cv=5)
grid.fit(X_train, y_train)
print(grid.best_params_)

6 Neural Networks (Keras)

Multi-Layer Perceptron — ReLU hidden layers, Softmax for multi-class output.

Python

from keras import models, layers
from keras.layers import Input

model = models.Sequential([
    Input(shape=(784,)),
    layers.Dense(64, activation='relu'),       # hidden layer
    layers.Dense(10, activation='softmax')     # multi-class output
])
model.compile(optimizer='adam',
              loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=10)

Recurrent Neural Network for sequence/text classification.

Python

from keras.models import Sequential
from keras.layers import Embedding, SimpleRNN, Dense

model = Sequential()
model.add(Embedding(input_dim=5000, output_dim=16))
model.add(SimpleRNN(32))
model.add(Dense(1, activation='sigmoid'))       # binary output
model.compile(loss='binary_crossentropy',
              optimizer='adam', metrics=['accuracy'])

7 Natural Language Processing

Text cleaning — lowercase, strip punctuation, tokenize.

Python

import string

text = text.lower()
text = text.translate(str.maketrans('', '', string.punctuation))
tokens = text.split()

print(tokens)

Output['hello', 'world', 'nlp', 'is', 'fun', 'isnt', 'it']

Tokenize & pad sequences to a fixed length for a neural network.

Python

from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

tok = Tokenizer(oov_token="<OOV>")
tok.fit_on_texts(sentences)
seqs   = tok.texts_to_sequences(sentences)
padded = pad_sequences(seqs, maxlen=20, padding='post')

8 Hugging Face & Softmax

Pre-trained pipelines — one line for sentiment, generation, summarization.

Python

from transformers import pipeline

clf = pipeline("sentiment-analysis")
print(clf("I love this course"))         # [{'label': 'POSITIVE', ...}]

gen = pipeline("text-generation", model="gpt2")
print(gen("Machine learning is", max_length=20))

Softmax — turns raw scores into probabilities that sum to 1 (used in attention & output layers).

Python

import numpy as np

def softmax(scores):
    e = np.exp(scores - np.max(scores))   # subtract max for stability
    return e / e.sum()

print(softmax([5, 1, 6]))   # -> [0.265 0.005 0.73]

9 Commercial APIs

OpenAI chat call — key read securely from an environment variable.

Python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])   # never hard-code

resp = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Explain RAG in one line."}])
print(resp.choices[0].message.content)

Embeddings via API — turn text into a vector for search / RAG.

Python

emb = client.embeddings.create(
    model="text-embedding-3-small", input="some text")
vector = emb.data[0].embedding          # a fixed-length numeric vector

10 Prompt Engineering

System prompt + Chain-of-Thought + low temperature for a precise, reasoned answer.

Python

messages = [
  {"role": "system",
   "content": "You are a concise tutor. If unsure, say 'I don't know'."},
  {"role": "user",
   "content": "What is 23*4 + 6? Let's think step by step."}
]
resp = client.chat.completions.create(
    model="gpt-4o-mini", messages=messages, temperature=0)

11 RAG — Retrieval-Augmented Generation

Embeddings & cosine similarity — match text by meaning.

Python

from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity

model = SentenceTransformer("all-MiniLM-L6-v2")
emb = model.encode(["refund policy", "how do I return an item"])
print(cosine_similarity([emb[0]], [emb[1]]))   # ~0.65  (similar)

Chroma vector database — store documents and query by meaning.

Python

import chromadb

client = chromadb.Client()
col = client.create_collection("docs")
col.add(documents=["Employees get 18 days of leave."], ids=["d1"])

res = col.query(query_texts=["How much vacation?"], n_results=1)
print(res["documents"][0])    # found by meaning, not keywords

12 Streamlit & Gradio

Streamlit app — pure-Python web UI. Run with streamlit run app.py.

Python · app.py

import streamlit as st

st.title("Text Analyser")
text = st.text_area("Enter text:")
if st.button("Analyse"):
    st.metric("Word count", len(text.split()))
    st.success("Done!")

Gradio — wrap a single function into a web UI automatically.

Python

import gradio as gr

def greet(name):
    return f"Hello, {name}!"

gr.Interface(fn=greet, inputs="text", outputs="text").launch()

13 AI Agents — LangChain & LangGraph

Define a tool and bind it to an LLM (function calling).

Python

from langchain_core.tools import tool

@tool
def multiply(a: int, b: int) -> int:
    """Multiply two numbers together."""
    return a * b

llm_with_tools = llm.bind_tools([multiply])
response = llm_with_tools.invoke("Calculate 50 times 173")
print(response.tool_calls)

LangGraph — a stateful agent graph: State, Nodes, Edges, then .compile().

Python

from typing import TypedDict, Annotated
from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages

class State(TypedDict):
    messages: Annotated[list, add_messages]   # add_messages APPENDS

graph = StateGraph(State)
graph.add_node("chat", chat_node)             # node = worker function
graph.add_edge(START, "chat")
graph.add_edge("chat", END)
app = graph.compile()

🎯 Coding-section game plan Recognise which pattern the question wants, write the import lines and skeleton from memory, then fill the logic. The right class name + the 5-step shape already earns most of the marks — even partial code scores.