Supervised Learning — Logistic Regression
When the answer is a category, not a number. Learn the sigmoid "S-curve", how probabilities become decisions, and why this classifier is named "regression".
In this lecture
4.1 The classification challenge
Why Linear Regression fails for classification
- Unbounded predictions — a straight line can output −0.2 or 1.7. But a probability must stay between 0 and 1. Negative probability is mathematically impossible.
- Bad fit — a straight line cannot match the "step" nature of binary decisions, causing high error near the boundary.
We need a curve that bends to stay strictly between 0 and 1. That curve is the sigmoid.
4.2 The Sigmoid Function
Behaviour of the S-curve
- Large positive z → σ(z) ≈ 1 (high probability of class 1)
- Large negative z → σ(z) ≈ 0 (low probability)
- z = 0 → σ(z) = exactly 0.5 (maximum uncertainty — a coin toss)
The S-shaped curve crosses the centre at 0.5 and flattens at the top and bottom — that is what keeps the output bounded.
import numpy as np
def sigmoid(z):
return 1 / (1 + np.exp(-z))
print("sigmoid(0) =", sigmoid(0))
print("sigmoid(5) =", round(sigmoid(5), 4))
print("sigmoid(-5) =", round(sigmoid(-5), 4))
4.3 Model Formulation & Log-Odds
Logistic Regression takes the familiar linear equation and wraps it inside the sigmoid:
| Linear Regression | Logistic Regression | |
|---|---|---|
| Equation | y = mx + c | P = σ(mx + c) = 1 / (1 + e−(mx+c)) |
| Output | Continuous number (−∞, +∞) | Probability (0, 1) |
| Visual shape | Straight line | S-curve (sigmoid) |
The inner part z = mx + c is the linear combination (input × weight + bias). The sigmoid then converts z into a probability.
The Log-Odds concept
Unlike Linear Regression, Logistic Regression does not predict the value directly — it predicts the log of the odds of success. If the odds of winning are 3:1, the model converts that to a probability of 75%. The linear part models the log-odds; the sigmoid converts log-odds back to probability.
Interpreting coefficients
- Positive weight → as that feature increases, the probability of class 1 increases.
- Negative weight → as the feature increases, the probability decreases. (e.g. "Hours of Exercise" with coefficient −0.5 means more exercise lowers heart-disease probability.)
- Zero weight → the feature has no effect on the prediction; the model learned it is irrelevant.
import pandas as pd
from sklearn.linear_model import LogisticRegression
# X = attendance %, y = passed (0 / 1)
X = df[['attendance_percent']] # 2D features
y = df['passed'] # binary target
model = LogisticRegression()
model.fit(X, y) # learn from labelled data
# Predict for a student with 85% attendance
print("Class :", model.predict([[85]])) # 0 or 1
print("Prob :", model.predict_proba([[85]])) # [P(0), P(1)]
model.predict() returns the final class label (0 or 1). model.predict_proba() returns the underlying probabilities for each class. The class is just the probability compared against the threshold.
4.4 The decision threshold
The sigmoid gives a probability like 0.75 — but the final answer must be Yes or No. We apply a decision boundary, by default 0.5:
4.5 Assumptions of Logistic Regression
- Binary outcome — the basic version's target must have exactly two categories.
- Independence — observations should not influence one another (e.g. distinct customers).
- No multicollinearity — predictor variables should not be copies of each other (like "Weight in kg" and "Weight in lbs").
- Linear relationship with log-odds — the features relate linearly to the log-odds (not to the probability directly).
4.6 Limitations of Logistic Regression
- Struggles with non-linear patterns — it draws a straight-line decision boundary. If the "Yes" points sit in a circle surrounded by "No" points, a straight line cannot separate them (unless you add polynomial features).
- Sensitive to outliers — better than Linear Regression thanks to the sigmoid, but still affected.
- Needs clean data — multicollinearity hurts it.
- Imbalanced data problem — if 99% of data is class 1, the model gets "lazy" and predicts 1 every time, scoring 99% accuracy without learning anything useful.
Select an answer, then check it. The explanations cover the exact reasoning examiners expect.
Logistic Regression is fundamentally used for:
Despite the misleading name, Logistic Regression is a classification algorithm — it predicts the probability of belonging to a category.
What is the range of the sigmoid function?
σ(z) = 1/(1+e⁻ᶻ) always produces a value strictly between 0 and 1 — perfect for representing a probability.
When the linear input z = 0, the sigmoid outputs:
σ(0) = 1/(1+e⁰) = 1/(1+1) = 0.5 — the point of maximum uncertainty, equivalent to a coin toss.
A spam model outputs a probability of 0.3. With the default threshold, the email is classified as:
0.3 < 0.5 (the default threshold), so it falls into Class 0 — Not Spam (the inbox).
A heart-disease model gives the feature "Hours of Exercise" a coefficient of −0.5. This means more exercise:
A negative coefficient means as the feature value goes up, the predicted probability goes down. More exercise → lower disease probability.
The "Yes" points form a circle surrounded by "No" points. Standard Logistic Regression will:
Logistic Regression draws a straight (linear) decision boundary. A circular pattern is non-linear — it cannot be separated by a line unless polynomial features are added.
A model trained on data that is 99% Class 1 starts predicting "Class 1" for everything. The likely cause is:
With heavily imbalanced data, predicting the majority class every time yields high accuracy without learning real patterns. This is why accuracy alone is a poor metric (see Lecture 5).
A logistic model computes z = 2 for an input. Roughly, is the predicted class 0 or 1, and why? (e² ≈ 7.39)
σ(2) = 1/(1+e⁻²) = 1/(1+0.135) = 1/1.135 ≈ 0.88. Since 0.88 > 0.5, the predicted class is 1.
Write a Python function sigmoid(z) using numpy, and use it to print the probability for z = −3, 0 and 4.
import numpy as np
def sigmoid(z):
return 1 / (1 + np.exp(-z))
for z in [-3, 0, 4]:
print(f"z={z:>3} -> P={sigmoid(z):.4f}")
Train a Logistic Regression model on a DataFrame df with feature attendance_percent and target passed, then predict whether a student with 85% attendance passes.
from sklearn.linear_model import LogisticRegression
X = df[['attendance_percent']] # double brackets -> 2D
y = df['passed']
model = LogisticRegression()
model.fit(X, y)
new_student = [[85]]
print("Prediction:", model.predict(new_student))
print("Probability:", model.predict_proba(new_student))
Output [1] means the student is predicted to pass; the probability of passing is 0.91.
Why can't we just use Linear Regression for a Yes/No classification problem?
Linear Regression outputs an unbounded number that can go below 0 or above 1, which is impossible for a probability. It also fits a straight line, which cannot match the sharp step between two classes, producing high error near the boundary. Logistic Regression wraps the linear equation in the sigmoid, squashing the output into a valid (0, 1) probability that can then be thresholded into a class.