Code Patterns Library
The coding section rewards muscle memory. Every reusable template from the syllabus is here — learn the shape of each one and you can rebuild it from memory under exam pressure.
.fit() → .predict(). Memorise that skeleton; then each model is just a different class name plugged into it.
1 The Universal ML Pattern
Every scikit-learn model has the same five-step shape. Swap the class name to change the model.
from sklearn.linear_model import LinearRegression X = [[1], [2], [3], [4]] # features (2-D) y = [10, 20, 30, 40] # labels model = LinearRegression() # 1. create model.fit(X, y) # 2. learn from data print(model.predict([[5]])) # 3. predict -> [50.]
2 Data Preprocessing
Missing values — count them, then impute with the median (robust to outliers).
import pandas as pd
df = pd.read_csv('data.csv')
print(df.isnull().sum()) # missing per column
df['Age'] = df['Age'].fillna(df['Age'].median()) # median imputationOutlier removal with the IQR method.
Q1 = df['Salary'].quantile(0.25) Q3 = df['Salary'].quantile(0.75) IQR = Q3 - Q1 low, high = Q1 - 1.5 * IQR, Q3 + 1.5 * IQR df = df[(df['Salary'] >= low) & (df['Salary'] <= high)]
Scaling & encoding — Min-Max for numbers, Label for ordinal, One-Hot for nominal.
from sklearn.preprocessing import MinMaxScaler, LabelEncoder import pandas as pd df['Income'] = MinMaxScaler().fit_transform(df[['Income']]) # -> [0,1] df['Size'] = LabelEncoder().fit_transform(df['Size']) # ordinal df = pd.get_dummies(df, columns=['City'], drop_first=True) # nominal
3 Regression & Classification
Linear Regression with a train/test split.
from sklearn.linear_model import LinearRegression from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.8) model = LinearRegression().fit(X_train, y_train) print(model.coef_, model.intercept_) # slope(s) and bias
Logistic Regression — note predict vs predict_proba.
from sklearn.linear_model import LogisticRegression model = LogisticRegression() model.fit(X, y) print(model.predict([[85]])) # class label: [0] or [1] print(model.predict_proba([[85]])) # probabilities: [P(0), P(1)]
4 Model Evaluation
Classification metrics from a confusion matrix.
from sklearn.metrics import (confusion_matrix, precision_score,
recall_score, f1_score, accuracy_score)
print(confusion_matrix(y_true, y_pred))
print("Precision:", precision_score(y_true, y_pred))
print("Recall :", recall_score(y_true, y_pred))
print("F1 :", f1_score(y_true, y_pred))Regression metrics and K-fold cross-validation.
import numpy as np from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score from sklearn.model_selection import cross_val_score mae = mean_absolute_error(y_true, y_pred) rmse = np.sqrt(mean_squared_error(y_true, y_pred)) # RMSE = sqrt(MSE) r2 = r2_score(y_true, y_pred) scores = cross_val_score(model, X, y, cv=5) # 5-fold CV
5 Decision Trees
A constrained tree (to limit overfitting) and automated tuning with GridSearchCV.
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.model_selection import GridSearchCV
clf = DecisionTreeClassifier(max_depth=3, criterion='gini')
clf.fit(X_train, y_train)
grid = GridSearchCV(DecisionTreeClassifier(),
{'max_depth': [2, 3, 4, 5]}, cv=5)
grid.fit(X_train, y_train)
print(grid.best_params_)6 Neural Networks (Keras)
Multi-Layer Perceptron — ReLU hidden layers, Softmax for multi-class output.
from keras import models, layers
from keras.layers import Input
model = models.Sequential([
Input(shape=(784,)),
layers.Dense(64, activation='relu'), # hidden layer
layers.Dense(10, activation='softmax') # multi-class output
])
model.compile(optimizer='adam',
loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=10)Recurrent Neural Network for sequence/text classification.
from keras.models import Sequential
from keras.layers import Embedding, SimpleRNN, Dense
model = Sequential()
model.add(Embedding(input_dim=5000, output_dim=16))
model.add(SimpleRNN(32))
model.add(Dense(1, activation='sigmoid')) # binary output
model.compile(loss='binary_crossentropy',
optimizer='adam', metrics=['accuracy'])7 Natural Language Processing
Text cleaning — lowercase, strip punctuation, tokenize.
import string
text = text.lower()
text = text.translate(str.maketrans('', '', string.punctuation))
tokens = text.split()Tokenize & pad sequences to a fixed length for a neural network.
from tensorflow.keras.preprocessing.text import Tokenizer from tensorflow.keras.preprocessing.sequence import pad_sequences tok = Tokenizer(oov_token="<OOV>") tok.fit_on_texts(sentences) seqs = tok.texts_to_sequences(sentences) padded = pad_sequences(seqs, maxlen=20, padding='post')
8 Hugging Face & Softmax
Pre-trained pipelines — one line for sentiment, generation, summarization.
from transformers import pipeline
clf = pipeline("sentiment-analysis")
print(clf("I love this course")) # [{'label': 'POSITIVE', ...}]
gen = pipeline("text-generation", model="gpt2")
print(gen("Machine learning is", max_length=20))Softmax — turns raw scores into probabilities that sum to 1 (used in attention & output layers).
import numpy as np
def softmax(scores):
e = np.exp(scores - np.max(scores)) # subtract max for stability
return e / e.sum()
print(softmax([5, 1, 6])) # -> [0.265 0.005 0.73]9 Commercial APIs
OpenAI chat call — key read securely from an environment variable.
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"]) # never hard-code
resp = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Explain RAG in one line."}])
print(resp.choices[0].message.content)Embeddings via API — turn text into a vector for search / RAG.
emb = client.embeddings.create(
model="text-embedding-3-small", input="some text")
vector = emb.data[0].embedding # a fixed-length numeric vector10 Prompt Engineering
System prompt + Chain-of-Thought + low temperature for a precise, reasoned answer.
messages = [
{"role": "system",
"content": "You are a concise tutor. If unsure, say 'I don't know'."},
{"role": "user",
"content": "What is 23*4 + 6? Let's think step by step."}
]
resp = client.chat.completions.create(
model="gpt-4o-mini", messages=messages, temperature=0)11 RAG — Retrieval-Augmented Generation
Embeddings & cosine similarity — match text by meaning.
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
model = SentenceTransformer("all-MiniLM-L6-v2")
emb = model.encode(["refund policy", "how do I return an item"])
print(cosine_similarity([emb[0]], [emb[1]])) # ~0.65 (similar)Chroma vector database — store documents and query by meaning.
import chromadb
client = chromadb.Client()
col = client.create_collection("docs")
col.add(documents=["Employees get 18 days of leave."], ids=["d1"])
res = col.query(query_texts=["How much vacation?"], n_results=1)
print(res["documents"][0]) # found by meaning, not keywords12 Streamlit & Gradio
Streamlit app — pure-Python web UI. Run with streamlit run app.py.
import streamlit as st
st.title("Text Analyser")
text = st.text_area("Enter text:")
if st.button("Analyse"):
st.metric("Word count", len(text.split()))
st.success("Done!")Gradio — wrap a single function into a web UI automatically.
import gradio as gr
def greet(name):
return f"Hello, {name}!"
gr.Interface(fn=greet, inputs="text", outputs="text").launch()13 AI Agents — LangChain & LangGraph
Define a tool and bind it to an LLM (function calling).
from langchain_core.tools import tool
@tool
def multiply(a: int, b: int) -> int:
"""Multiply two numbers together."""
return a * b
llm_with_tools = llm.bind_tools([multiply])
response = llm_with_tools.invoke("Calculate 50 times 173")
print(response.tool_calls)LangGraph — a stateful agent graph: State, Nodes, Edges, then .compile().
from typing import TypedDict, Annotated
from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages
class State(TypedDict):
messages: Annotated[list, add_messages] # add_messages APPENDS
graph = StateGraph(State)
graph.add_node("chat", chat_node) # node = worker function
graph.add_edge(START, "chat")
graph.add_edge("chat", END)
app = graph.compile()import lines and skeleton from memory, then fill the logic. The right class name + the 5-step shape already earns most of the marks — even partial code scores.