⚡ LECTURE 4

Supervised Learning — Logistic Regression

When the answer is a category, not a number. Learn the sigmoid "S-curve", how probabilities become decisions, and why this classifier is named "regression".

Syllabus topics 11–14 ⏱ ~22 min read 11 practice questions

In this lecture

The classification challenge
The Sigmoid Function
Model Formulation & Log-Odds
Decision threshold
Assumptions
Limitations
Practice Questions

4.1 The classification challenge

⚠️ Do not let the name fool you Despite "Regression" in its name, Logistic Regression is a CLASSIFICATION algorithm. Regression predicts numbers; Logistic Regression separates data into categories (Spam/Not-Spam, Pass/Fail, Yes/No).

Logistic Regression — a supervised algorithm used to predict the probability of a binary outcome. It answers: "what is the % chance this belongs to class 1?"

Why Linear Regression fails for classification

Unbounded predictions — a straight line can output −0.2 or 1.7. But a probability must stay between 0 and 1. Negative probability is mathematically impossible.
Bad fit — a straight line cannot match the "step" nature of binary decisions, causing high error near the boundary.

We need a curve that bends to stay strictly between 0 and 1. That curve is the sigmoid.

4.2 The Sigmoid Function

Sigmoid (logistic) function — takes any real number (big or small) and "squashes" it into a value strictly between 0 and 1.

σ(z) = 1 / (1 + e^−z) where e ≈ 2.718. The output is always in the open interval (0, 1).

Behaviour of the S-curve

Large positive z → σ(z) ≈ 1 (high probability of class 1)
Large negative z → σ(z) ≈ 0 (low probability)
z = 0 → σ(z) = exactly 0.5 (maximum uncertainty — a coin toss)

The S-shaped curve crosses the centre at 0.5 and flattens at the top and bottom — that is what keeps the output bounded.

Python · the sigmoid function

import numpy as np

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

print("sigmoid(0)  =", sigmoid(0))
print("sigmoid(5)  =", round(sigmoid(5), 4))
print("sigmoid(-5) =", round(sigmoid(-5), 4))

Outputsigmoid(0) = 0.5 sigmoid(5) = 0.9933 sigmoid(-5) = 0.0067

🧩 Why sigmoid handles outliers gracefully In a salary dataset, one person earns $1 billion. In Linear Regression this outlier would wreck the line. But the sigmoid squashes it: whether the input is 1 million or 1 billion, the curve caps the probability at 1.0 — the model does not break.

4.3 Model Formulation & Log-Odds

Logistic Regression takes the familiar linear equation and wraps it inside the sigmoid:

	Linear Regression	Logistic Regression
Equation	y = mx + c	P = σ(mx + c) = 1 / (1 + e^−(mx+c))
Output	Continuous number (−∞, +∞)	Probability (0, 1)
Visual shape	Straight line	S-curve (sigmoid)

The inner part z = mx + c is the linear combination (input × weight + bias). The sigmoid then converts z into a probability.

The Log-Odds concept

Unlike Linear Regression, Logistic Regression does not predict the value directly — it predicts the log of the odds of success. If the odds of winning are 3:1, the model converts that to a probability of 75%. The linear part models the log-odds; the sigmoid converts log-odds back to probability.

Interpreting coefficients

Positive weight → as that feature increases, the probability of class 1 increases.
Negative weight → as the feature increases, the probability decreases. (e.g. "Hours of Exercise" with coefficient −0.5 means more exercise lowers heart-disease probability.)
Zero weight → the feature has no effect on the prediction; the model learned it is irrelevant.

Python · Logistic Regression with scikit-learn

import pandas as pd
from sklearn.linear_model import LogisticRegression

# X = attendance %, y = passed (0 / 1)
X = df[['attendance_percent']]      # 2D features
y = df['passed']                   # binary target

model = LogisticRegression()
model.fit(X, y)                    # learn from labelled data

# Predict for a student with 85% attendance
print("Class :", model.predict([[85]]))           # 0 or 1
print("Prob  :", model.predict_proba([[85]]))      # [P(0), P(1)]

OutputClass : [1] Prob : [[0.12 0.88]]

💡 Tip — predict vs predict_proba model.predict() returns the final class label (0 or 1). model.predict_proba() returns the underlying probabilities for each class. The class is just the probability compared against the threshold.

4.4 The decision threshold

The sigmoid gives a probability like 0.75 — but the final answer must be Yes or No. We apply a decision boundary, by default 0.5:

P > 0.5 → Class 1 (Yes) | P < 0.5 → Class 0 (No)

🧩 When to move the threshold — Cancer Detection Should the threshold always stay at 0.5? No. In cancer detection, missing a sick patient (false negative) is far more dangerous than a false alarm. So we lower the threshold (e.g. to 0.3) to catch more potential cases — accepting more false positives to avoid missing real cases. The threshold is a business decision, not a fixed number.

4.5 Assumptions of Logistic Regression

Binary outcome — the basic version's target must have exactly two categories.
Independence — observations should not influence one another (e.g. distinct customers).
No multicollinearity — predictor variables should not be copies of each other (like "Weight in kg" and "Weight in lbs").
Linear relationship with log-odds — the features relate linearly to the log-odds (not to the probability directly).

4.6 Limitations of Logistic Regression

Struggles with non-linear patterns — it draws a straight-line decision boundary. If the "Yes" points sit in a circle surrounded by "No" points, a straight line cannot separate them (unless you add polynomial features).
Sensitive to outliers — better than Linear Regression thanks to the sigmoid, but still affected.
Needs clean data — multicollinearity hurts it.
Imbalanced data problem — if 99% of data is class 1, the model gets "lazy" and predicts 1 every time, scoring 99% accuracy without learning anything useful.

🔑 Pros & Cons summary Pros: easy to interpret (you see which features matter), outputs probabilities not just labels, efficient on large datasets. Cons: can't model complex non-linear boundaries, sensitive to outliers, requires clean data. Bonus fact: a neural network is essentially many logistic regressions stacked together.

? Practice Questions

Select an answer, then check it. The explanations cover the exact reasoning examiners expect.

MCQQ1Basics

Logistic Regression is fundamentally used for:

A Regression — predicting continuous numbers
B Classification — predicting categories
C Clustering unlabelled data
D Reducing dimensionality

Answer: B

Despite the misleading name, Logistic Regression is a classification algorithm — it predicts the probability of belonging to a category.

MCQQ2Sigmoid

What is the range of the sigmoid function?

A (−∞, +∞)
B (0, 1)
C (−1, 1)
D [0, 100]

Answer: B

σ(z) = 1/(1+e⁻ᶻ) always produces a value strictly between 0 and 1 — perfect for representing a probability.

MCQQ3Sigmoid

When the linear input z = 0, the sigmoid outputs:

A 0
B 1
C 0.5
D Undefined

Answer: C

σ(0) = 1/(1+e⁰) = 1/(1+1) = 0.5 — the point of maximum uncertainty, equivalent to a coin toss.

MCQQ4Threshold

A spam model outputs a probability of 0.3. With the default threshold, the email is classified as:

A Not Spam (Class 0)
B Spam (Class 1)
C Cannot be determined
D Both classes simultaneously

Answer: A

0.3 < 0.5 (the default threshold), so it falls into Class 0 — Not Spam (the inbox).

MCQQ5Coefficients

A heart-disease model gives the feature "Hours of Exercise" a coefficient of −0.5. This means more exercise:

A Increases the probability of disease
B Decreases the probability of disease
C Has no effect
D Makes the prediction undefined

Answer: B

A negative coefficient means as the feature value goes up, the predicted probability goes down. More exercise → lower disease probability.

MCQQ6Limitations

The "Yes" points form a circle surrounded by "No" points. Standard Logistic Regression will:

A Solve it perfectly with a curved boundary
B Fail, because it can only draw a straight-line boundary
C Convert the problem to regression automatically
D Refuse to train

Answer: B

Logistic Regression draws a straight (linear) decision boundary. A circular pattern is non-linear — it cannot be separated by a line unless polynomial features are added.

MCQQ7Imbalanced data

A model trained on data that is 99% Class 1 starts predicting "Class 1" for everything. The likely cause is:

A The sigmoid function is broken
B Imbalanced data — the model "lazily" guesses the majority class
C Too many features
D The threshold is exactly 0.5

Answer: B

With heavily imbalanced data, predicting the majority class every time yields high accuracy without learning real patterns. This is why accuracy alone is a poor metric (see Lecture 5).

NumericalQ8Sigmoid

A logistic model computes z = 2 for an input. Roughly, is the predicted class 0 or 1, and why? (e² ≈ 7.39)

Answer: Class 1

σ(2) = 1/(1+e⁻²) = 1/(1+0.135) = 1/1.135 ≈ 0.88. Since 0.88 > 0.5, the predicted class is 1.

CodingQ9Sigmoid

Write a Python function sigmoid(z) using numpy, and use it to print the probability for z = −3, 0 and 4.

Solution

Python

import numpy as np

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

for z in [-3, 0, 4]:
    print(f"z={z:>3} -> P={sigmoid(z):.4f}")

Outputz= -3 -> P=0.0474 z= 0 -> P=0.5000 z= 4 -> P=0.9820

CodingQ10Train classifier

Train a Logistic Regression model on a DataFrame df with feature attendance_percent and target passed, then predict whether a student with 85% attendance passes.

Solution

Python

from sklearn.linear_model import LogisticRegression

X = df[['attendance_percent']]    # double brackets -> 2D
y = df['passed']

model = LogisticRegression()
model.fit(X, y)

new_student = [[85]]
print("Prediction:", model.predict(new_student))
print("Probability:", model.predict_proba(new_student))

OutputPrediction: [1] Probability: [[0.09 0.91]]

Output [1] means the student is predicted to pass; the probability of passing is 0.91.

Short AnswerQ11Concept

Why can't we just use Linear Regression for a Yes/No classification problem?

Model answer

Linear Regression outputs an unbounded number that can go below 0 or above 1, which is impossible for a probability. It also fits a straight line, which cannot match the sharp step between two classes, producing high error near the boundary. Logistic Regression wraps the linear equation in the sigmoid, squashing the output into a valid (0, 1) probability that can then be thresholded into a class.

🎯 Lecture 4 — must-remember Logistic Regression = classification. Sigmoid σ(z) = 1/(1+e⁻ᶻ), range (0,1), σ(0)=0.5. Equation P = σ(mx+c). Default threshold 0.5 (lower it when false negatives are dangerous). Models log-odds. Limitation: only linear (straight-line) decision boundaries.

← Previous

Linear Regression

Model Evaluation