GenAI Exam Prep
Home Mock Exam
⚡ LECTURE 4

Supervised Learning — Logistic Regression

When the answer is a category, not a number. Learn the sigmoid "S-curve", how probabilities become decisions, and why this classifier is named "regression".

Syllabus topics 11–14 ⏱ ~22 min read 11 practice questions

4.1 The classification challenge

⚠️ Do not let the name fool you Despite "Regression" in its name, Logistic Regression is a CLASSIFICATION algorithm. Regression predicts numbers; Logistic Regression separates data into categories (Spam/Not-Spam, Pass/Fail, Yes/No).
Logistic Regression — a supervised algorithm used to predict the probability of a binary outcome. It answers: "what is the % chance this belongs to class 1?"

Why Linear Regression fails for classification

We need a curve that bends to stay strictly between 0 and 1. That curve is the sigmoid.

4.2 The Sigmoid Function

Sigmoid (logistic) function — takes any real number (big or small) and "squashes" it into a value strictly between 0 and 1.
σ(z) = 1 / (1 + e−z) where e ≈ 2.718. The output is always in the open interval (0, 1).

Behaviour of the S-curve

The S-shaped curve crosses the centre at 0.5 and flattens at the top and bottom — that is what keeps the output bounded.

Python · the sigmoid function
import numpy as np

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

print("sigmoid(0)  =", sigmoid(0))
print("sigmoid(5)  =", round(sigmoid(5), 4))
print("sigmoid(-5) =", round(sigmoid(-5), 4))
Outputsigmoid(0) = 0.5 sigmoid(5) = 0.9933 sigmoid(-5) = 0.0067
🧩 Why sigmoid handles outliers gracefully In a salary dataset, one person earns $1 billion. In Linear Regression this outlier would wreck the line. But the sigmoid squashes it: whether the input is 1 million or 1 billion, the curve caps the probability at 1.0 — the model does not break.

4.3 Model Formulation & Log-Odds

Logistic Regression takes the familiar linear equation and wraps it inside the sigmoid:

Linear RegressionLogistic Regression
Equationy = mx + cP = σ(mx + c) = 1 / (1 + e−(mx+c))
OutputContinuous number (−∞, +∞)Probability (0, 1)
Visual shapeStraight lineS-curve (sigmoid)

The inner part z = mx + c is the linear combination (input × weight + bias). The sigmoid then converts z into a probability.

The Log-Odds concept

Unlike Linear Regression, Logistic Regression does not predict the value directly — it predicts the log of the odds of success. If the odds of winning are 3:1, the model converts that to a probability of 75%. The linear part models the log-odds; the sigmoid converts log-odds back to probability.

Interpreting coefficients

Python · Logistic Regression with scikit-learn
import pandas as pd
from sklearn.linear_model import LogisticRegression

# X = attendance %, y = passed (0 / 1)
X = df[['attendance_percent']]      # 2D features
y = df['passed']                   # binary target

model = LogisticRegression()
model.fit(X, y)                    # learn from labelled data

# Predict for a student with 85% attendance
print("Class :", model.predict([[85]]))           # 0 or 1
print("Prob  :", model.predict_proba([[85]]))      # [P(0), P(1)]
OutputClass : [1] Prob : [[0.12 0.88]]
💡 Tip — predict vs predict_proba model.predict() returns the final class label (0 or 1). model.predict_proba() returns the underlying probabilities for each class. The class is just the probability compared against the threshold.

4.4 The decision threshold

The sigmoid gives a probability like 0.75 — but the final answer must be Yes or No. We apply a decision boundary, by default 0.5:

P > 0.5 → Class 1 (Yes)   |   P < 0.5 → Class 0 (No)
🧩 When to move the threshold — Cancer Detection Should the threshold always stay at 0.5? No. In cancer detection, missing a sick patient (false negative) is far more dangerous than a false alarm. So we lower the threshold (e.g. to 0.3) to catch more potential cases — accepting more false positives to avoid missing real cases. The threshold is a business decision, not a fixed number.

4.5 Assumptions of Logistic Regression

4.6 Limitations of Logistic Regression

🔑 Pros & Cons summary Pros: easy to interpret (you see which features matter), outputs probabilities not just labels, efficient on large datasets. Cons: can't model complex non-linear boundaries, sensitive to outliers, requires clean data. Bonus fact: a neural network is essentially many logistic regressions stacked together.
? Practice Questions

Select an answer, then check it. The explanations cover the exact reasoning examiners expect.

MCQQ1Basics

Logistic Regression is fundamentally used for:

  • A Regression — predicting continuous numbers
  • B Classification — predicting categories
  • C Clustering unlabelled data
  • D Reducing dimensionality
Answer: B

Despite the misleading name, Logistic Regression is a classification algorithm — it predicts the probability of belonging to a category.

MCQQ2Sigmoid

What is the range of the sigmoid function?

  • A (−∞, +∞)
  • B (0, 1)
  • C (−1, 1)
  • D [0, 100]
Answer: B

σ(z) = 1/(1+e⁻ᶻ) always produces a value strictly between 0 and 1 — perfect for representing a probability.

MCQQ3Sigmoid

When the linear input z = 0, the sigmoid outputs:

  • A 0
  • B 1
  • C 0.5
  • D Undefined
Answer: C

σ(0) = 1/(1+e⁰) = 1/(1+1) = 0.5 — the point of maximum uncertainty, equivalent to a coin toss.

MCQQ4Threshold

A spam model outputs a probability of 0.3. With the default threshold, the email is classified as:

  • A Not Spam (Class 0)
  • B Spam (Class 1)
  • C Cannot be determined
  • D Both classes simultaneously
Answer: A

0.3 < 0.5 (the default threshold), so it falls into Class 0 — Not Spam (the inbox).

MCQQ5Coefficients

A heart-disease model gives the feature "Hours of Exercise" a coefficient of −0.5. This means more exercise:

  • A Increases the probability of disease
  • B Decreases the probability of disease
  • C Has no effect
  • D Makes the prediction undefined
Answer: B

A negative coefficient means as the feature value goes up, the predicted probability goes down. More exercise → lower disease probability.

MCQQ6Limitations

The "Yes" points form a circle surrounded by "No" points. Standard Logistic Regression will:

  • A Solve it perfectly with a curved boundary
  • B Fail, because it can only draw a straight-line boundary
  • C Convert the problem to regression automatically
  • D Refuse to train
Answer: B

Logistic Regression draws a straight (linear) decision boundary. A circular pattern is non-linear — it cannot be separated by a line unless polynomial features are added.

MCQQ7Imbalanced data

A model trained on data that is 99% Class 1 starts predicting "Class 1" for everything. The likely cause is:

  • A The sigmoid function is broken
  • B Imbalanced data — the model "lazily" guesses the majority class
  • C Too many features
  • D The threshold is exactly 0.5
Answer: B

With heavily imbalanced data, predicting the majority class every time yields high accuracy without learning real patterns. This is why accuracy alone is a poor metric (see Lecture 5).

NumericalQ8Sigmoid

A logistic model computes z = 2 for an input. Roughly, is the predicted class 0 or 1, and why? (e² ≈ 7.39)

Answer: Class 1

σ(2) = 1/(1+e⁻²) = 1/(1+0.135) = 1/1.135 ≈ 0.88. Since 0.88 > 0.5, the predicted class is 1.

CodingQ9Sigmoid

Write a Python function sigmoid(z) using numpy, and use it to print the probability for z = −3, 0 and 4.

Solution
Python
import numpy as np

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

for z in [-3, 0, 4]:
    print(f"z={z:>3} -> P={sigmoid(z):.4f}")
Outputz= -3 -> P=0.0474 z= 0 -> P=0.5000 z= 4 -> P=0.9820
CodingQ10Train classifier

Train a Logistic Regression model on a DataFrame df with feature attendance_percent and target passed, then predict whether a student with 85% attendance passes.

Solution
Python
from sklearn.linear_model import LogisticRegression

X = df[['attendance_percent']]    # double brackets -> 2D
y = df['passed']

model = LogisticRegression()
model.fit(X, y)

new_student = [[85]]
print("Prediction:", model.predict(new_student))
print("Probability:", model.predict_proba(new_student))
OutputPrediction: [1] Probability: [[0.09 0.91]]

Output [1] means the student is predicted to pass; the probability of passing is 0.91.

Short AnswerQ11Concept

Why can't we just use Linear Regression for a Yes/No classification problem?

Model answer

Linear Regression outputs an unbounded number that can go below 0 or above 1, which is impossible for a probability. It also fits a straight line, which cannot match the sharp step between two classes, producing high error near the boundary. Logistic Regression wraps the linear equation in the sigmoid, squashing the output into a valid (0, 1) probability that can then be thresholded into a class.

🎯 Lecture 4 — must-remember Logistic Regression = classification. Sigmoid σ(z) = 1/(1+e⁻ᶻ), range (0,1), σ(0)=0.5. Equation P = σ(mx+c). Default threshold 0.5 (lower it when false negatives are dangerous). Models log-odds. Limitation: only linear (straight-line) decision boundaries.