GenAI Exam Prep
Home Mock Exam
⚡ LECTURE 3

Supervised Learning — Linear Regression

The first and most fundamental supervised model. Learn how a straight line can predict numbers, how it learns the "best" line, the assumptions it relies on, and where it breaks.

Syllabus topics 8–10 ⏱ ~24 min read 11 practice questions

3.1 What Linear Regression is

🍞 The bakery story You run a bakery and note each day: loaves baked vs money earned. After a month you see a trend — more loaves → more revenue. But how exactly? Does each loaf add ₹20 or ₹50? Linear Regression draws the best-fitting straight line through your scatter of points so you can answer: what is the relationship, how much does revenue change per extra loaf, and what revenue to expect tomorrow.
Linear Regression — a supervised, predictive model that estimates the relationship between a target variable y and one or more input features x by fitting a straight line (or a flat plane) that minimises the errors between predictions and actual values.

It is a regression task — the output is a continuous number (house price, salary, temperature, sales). It is usually the first supervised model taught because it is:

Two types

The core idea is identical: fit a line/plane that best predicts the output.

3.2 Model Formulation

The simple linear regression equation:

y = β0 + β1x + ε
TermNameMeaning
yTarget / OutputWhat we predict — e.g. revenue
xInput / PredictorThe feature — e.g. loaves baked
β0Intercept (bias)Baseline value of y when x = 0
β1Slope (weight)Change in y for a one-unit change in x
εError termRandom variation the model cannot explain

You may also see it written y = mx + c where m is the slope and c the intercept — identical idea.

💡 Tip — interpreting the slope The slope is the headline result. If a house-price model learns Price = 0.0625 × Size − 8.75, the slope 0.0625 means each extra square foot adds 0.0625 lakhs to the predicted price. A positive slope = positive correlation; negative slope = negative correlation.

3.3 How the model learns — Ordinary Least Squares

Many straight lines could pass through a cloud of points. Which one does the model pick? The one that makes the smallest overall mistakes.

Residuals (prediction errors)

Residual — for each data point, error = actual value − predicted value. A positive residual means the prediction was too low; negative means too high.

Why square the errors?

If you simply add raw errors, positive and negative residuals cancel out — a bad line could look "good" by accident. So Linear Regression squares every error (removing the sign), adds them up, and chooses the line with the smallest total. This is Ordinary Least Squares (OLS).

Minimise   Σ (y − ŷ)²   =   Σ (residual)² OLS finds the line with the minimum Sum of Squared Errors.

The closed-form OLS formulas

m = Σ(x − x̄)(y − ȳ)  /  Σ(x − x̄)²
c = ȳ − m·x̄ Numerator = covariance (do x and y move together?); denominator = variance of x (its spread). OLS forces the line through the average point (x̄, ȳ).
🧩 Worked example — house prices (from the worksheet) Data: sizes 800–1800 sq ft, prices 40–110 lakhs. Means: x̄ = 1260, ȳ = 70. Plugging into the formulas gives m = 0.0625 and c = −8.75, so the learned equation is Price = 0.0625 × Size − 8.75. Predicting a 1400 sq ft house: 0.0625 × 1400 − 8.75 = 78.75 lakhs.

Doing it in code — by hand then with scikit-learn

Python · OLS by hand
import pandas as pd

df = pd.DataFrame({"Size": [800,1000,1200,1500,1800],
                   "Price": [40,60,65,75,110]})
x, y = df["Size"], df["Price"]
x_mean, y_mean = x.mean(), y.mean()

# slope and intercept from the OLS formulas
m = ((x - x_mean) * (y - y_mean)).sum() / ((x - x_mean) ** 2).sum()
c = y_mean - m * x_mean
print(f"Price = {m:.4f} x Size + {c:.2f}")
print("Predict 1400 sq ft:", m * 1400 + c)
OutputPrice = 0.0625 x Size + -8.75 Predict 1400 sq ft: 78.75
Python · same thing with scikit-learn
from sklearn.linear_model import LinearRegression

X = df[["Size"]]       # 2D for sklearn
y = df["Price"]

model = LinearRegression()
model.fit(X, y)

print("Slope (coef_):", model.coef_[0])
print("Intercept:", model.intercept_)
print("Predict 1400:", model.predict([[1400]]))
OutputSlope (coef_): 0.0625 Intercept: -8.75 Predict 1400: [78.75]
💡 Tip — sklearn attributes to memorise After .fit(): model.coef_ holds the slope(s) and model.intercept_ holds β0. For multiple regression, coef_ is an array — one weight per feature, in the same order as the columns of X.

Diagnostic plots

3.4 The Seven Assumptions

Linear Regression only works well if these assumptions are approximately true. Examiners frequently ask you to name or recognise them.

#AssumptionWhat it means / fix if violated
1LinearityThe input–output relationship must be roughly a straight line. Curved residual plot = violated. Fix: polynomial terms, log transform, or a tree model.
2Independence of ErrorsErrors must not correlate with each other (critical for time-series). Fix: lag features or time-series models.
3HomoscedasticityError variance is constant across all x. If errors "fan out", variance is not constant. Fix: log-transform y, or weighted least squares.
4Zero Mean of ErrorsResiduals should average to zero. Fix: always include an intercept term.
5No MulticollinearityInput features must not be highly correlated with each other. Fix: drop correlated features, PCA, or ridge regression.
6ExogeneityPredictors must not correlate with the error term, else coefficients are biased. Fix: add confounders or instrumental variables.
7Normality of ErrorsFor small datasets, residuals should be approximately normally distributed. Fix: transformations or bootstrap methods.
🔑 Memory hook — "L.I.N.E." The four most-tested assumptions: Linearity, Independence of errors, Normality of errors, and Equal variance (Homoscedasticity). Add: no multicollinearity, zero-mean errors, exogeneity → seven total.

3.5 Limitations of Linear Regression

⚠️ Exam trap "Linear Regression assumes a straight-line relationship." If the data is a curve (or worse, dosage data that goes up then down), a straight line cannot fit it — this is exactly why Decision Trees (Lecture 6) exist.
? Practice Questions

Choose, check, and read every explanation — these mirror the exam style closely.

MCQQ1Formulation

In y = β₀ + β₁x + ε, what does β₁ represent?

  • A The value of y when x = 0
  • B The change in y for a one-unit increase in x
  • C The random error
  • D The total number of data points
Answer: B

β₁ is the slope — the change in y per unit change in x. β₀ (the intercept) is the value of y when x = 0; ε is the random error term.

MCQQ2OLS

Why does Ordinary Least Squares square the errors instead of just adding them?

  • A To make the maths run faster
  • B So positive and negative errors do not cancel each other out
  • C Because errors are always negative
  • D To convert the result into a percentage
Answer: B

If you add raw residuals, a +10 and a −10 cancel — a terrible line could look perfect. Squaring removes the sign, so all errors contribute positively, and also penalises large errors more heavily.

MCQQ3Assumptions

"The variance of the residuals is constant across all values of x." This assumption is called:

  • A Linearity
  • B Homoscedasticity
  • C Multicollinearity
  • D Exogeneity
Answer: B

Homoscedasticity = constant error variance. When errors "fan out" (variance grows with x) it is called heteroscedasticity, and the assumption is violated.

MCQQ4Limitations

Linear Regression is a poor choice when:

  • A The relationship between x and y is roughly a straight line
  • B The data has a strongly non-linear (curved) pattern
  • C You want an interpretable model
  • D The dataset is large
Answer: B

Linear Regression fundamentally fits a straight line; it fails on curved/non-linear patterns. It is fine and even preferred for linear relationships, interpretability and large datasets.

MCQQ5Multicollinearity

Including both "Weight in kg" and "Weight in lbs" as features causes:

  • A Heteroscedasticity
  • B Multicollinearity — making coefficient estimates unstable
  • C Underfitting
  • D Nothing — it improves accuracy
Answer: B

The two columns are perfectly correlated copies of each other. The model cannot decide how to split the weight between them, so coefficients become unstable and uninterpretable.

MCQQ6Outliers

Why is Linear Regression especially sensitive to outliers?

  • A OLS squares the errors, so a single huge error dominates the total and tilts the line
  • B It ignores all large values by design
  • C Outliers make the model run out of memory
  • D It is not sensitive to outliers at all
Answer: A

Because errors are squared, an outlier with a large residual contributes an enormous squared term, so the fitted line shifts dramatically to reduce that one error.

NumericalQ7Prediction

A model learned Price = 0.0625 × Size − 8.75. What is the predicted price for a 2000 sq ft house?

Answer: 116.25

Price = 0.0625 × 2000 − 8.75 = 125 − 8.75 = 116.25 lakhs. Just substitute the size into the learned equation.

NumericalQ8Residuals

A model predicts a price of 66.25 for a house whose actual price is 65. What is the residual?

Answer: −1.25

Residual = actual − predicted = 65 − 66.25 = −1.25. A negative residual means the model over-predicted.

CodingQ9Train a model

Using scikit-learn, train a simple linear regression on X = [[1],[2],[3],[4],[5]] and y = [3,5,7,9,11], then print the slope, intercept, and the prediction for x = 8.

Solution
Python
from sklearn.linear_model import LinearRegression

X = [[1], [2], [3], [4], [5]]
y = [3, 5, 7, 9, 11]              # the true rule is y = 2x + 1

model = LinearRegression()
model.fit(X, y)

print("Slope:", model.coef_[0])
print("Intercept:", model.intercept_)
print("Predict x=8:", model.predict([[8]]))
OutputSlope: 2.0 Intercept: 1.0 Predict x=8: [17.]

The model recovers the exact rule y = 2x + 1, so x = 8 → 2(8)+1 = 17.

CodingQ10Multiple regression

You have a DataFrame df with feature columns ['Hours','PrevScore'] and target 'Result'. Write code to split into train/test sets, train a multiple linear regression, and print the model's coefficients.

Solution
Python
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

X = df[['Hours', 'PrevScore']]   # multiple features
y = df['Result']

X_train, X_test, y_train, y_test = train_test_split(
    X, y, train_size=0.75, random_state=42)

model = LinearRegression()
model.fit(X_train, y_train)

print("Coefficients:", model.coef_)     # one weight per feature
print("Intercept:", model.intercept_)
OutputCoefficients: [2.865 1.021] Intercept: -34.30

Each value in coef_ is the weight for one feature, in column order: Hours → 2.865, PrevScore → 1.021.

Short AnswerQ11Concept

Name any three assumptions of Linear Regression and briefly explain why each matters.

Model answer

Linearity — the model fits a straight line, so the true relationship must be roughly linear or predictions are systematically wrong. Homoscedasticity — error variance must be constant; if it grows with x, confidence in predictions becomes unreliable. No multicollinearity — input features must not be near-duplicates, otherwise the coefficients become unstable and uninterpretable. (Other valid answers: independence of errors, normality of errors, zero-mean errors, exogeneity.)

🎯 Lecture 3 — must-remember Equation: y = β₀ + β₁x + ε. OLS minimises Σ(y−ŷ)² (squares errors so they don't cancel). Slope formula: Σ(x−x̄)(y−ȳ)/Σ(x−x̄)². Seven assumptions (L.I.N.E. + no multicollinearity, zero-mean errors, exogeneity). Limitations: linear-only, outlier-sensitive, multicollinearity, omitted-variable bias.