⚡ LECTURE 1

Introduction to Machine Learning

The foundation of everything else in this course — understand what Machine Learning really is, how it differs from normal programming, and the vocabulary every later lecture builds on.

Syllabus topics 1–4 ⏱ ~22 min read 10 practice questions

In this lecture

ML vs Traditional Programming
AI vs ML vs Deep Learning
Types of Learning
Types of ML Tasks
Practice Questions

Machine Learning (ML) is the engine underneath every Generative AI system you will study in this course — ChatGPT, image generators, AI agents. Before we can understand generative AI, we must be rock-solid on what "learning from data" actually means. This lecture builds that base.

1.1 ML vs Traditional Programming

The single most important idea in this entire course is the shift from writing rules to learning rules from data.

Machine Learning — a branch of AI where computers learn patterns from data instead of being explicitly programmed with hand-written rules.

The two paradigms side by side

	Traditional Programming	Machine Learning
Input	Rules + Data	Data + the correct Answers
What the human writes	Explicit `if / else` logic	Just collects examples
What the computer produces	Answers (by running the rules)	The Rules themselves (a "model")
Logic comes from	The programmer's brain	Patterns discovered in past data
New situations	Fail unless a rule exists	Generalise from learned patterns

🎰 The Casino Vault story (from the worksheet) A gang is feeding fake coins into slot machines. Old-school approach: your boss says "fake coins look dirty — reject every dirty coin." But a shiny fake coin sails through and a genuine dirty coin gets rejected. A single visual rule is too crude. ML approach: instead of writing rules, you feed the computer a table of past coins — their look, diameter, and the verified label REAL/FAKE. The computer finds that diameter near 24 mm signals FAKE regardless of shine. It learned the rule you could never have written by hand.

Why rules alone fail

Consider spam email detection. Could you write if-else rules for it? You would need a rule for every scam phrase, every suspicious link pattern, every fake sender — thousands of rules, and spammers change tactics daily. A "simple" program cannot keep up. ML instead learns patterns in suspicious words, links and sender behaviour from labelled examples, and automatically adapts as you feed it new data.

🔑 Exam-critical distinction In traditional programming the human supplies the logic. In machine learning the human supplies the examples and the machine discovers the logic. This is the most-tested idea of Lecture 1.

Three kinds of algorithms

The lecture also names three algorithm categories you should recognise:

Deterministic — same input always gives exactly the same output (e.g. sorting a list).
Non-deterministic — output can vary between runs even for the same input.
Probabilistic — output is based on probabilities; ML models are largely probabilistic, predicting the most likely answer.

💡 Tip — the "write the number 4" test Try writing exact rules that describe every way a human could draw the digit "4". Impossible — there are infinite valid shapes. This is the classic argument for ML: when the pattern is too rich to hard-code, learn it from examples instead.

1.2 AI vs ML vs Deep Learning

These three terms are not separate technologies — they are nested inside one another, like Russian nesting dolls.

Artificial Intelligence ⊃ Machine Learning ⊃ Deep Learning Each is a subset of the one before it. All Deep Learning is ML; all ML is AI — but not the reverse.

Layer	Definition	Examples of algorithms
Artificial Intelligence "The Dream"	Any technique that makes a computer mimic human intelligence — including 1980s hard-coded `if-else` systems.	A* search, Adversarial Search, Hidden Markov Models
Machine Learning "The Method"	Algorithms that parse data, learn from it, and apply what they learned to make decisions. A subset of AI.	Linear Regression, Logistic Regression, K-Means, Decision Trees
Deep Learning "The Engine"	ML using Artificial Neural Networks with many layers ("deep" = many layers). Inspired by the brain.	Deep Neural Networks, CNNs, Transformers

The key difference: feature engineering

This is the distinction examiners love to test:

Classic ML needs you to hand-feed features. To predict house prices you must explicitly give it "Area", "Bedrooms", "Bathrooms".
Deep Learning extracts features on its own. Feed it raw pixels of a house photo and it automatically discovers edges → shapes → textures → objects, with no human telling it what to look for.

🧩 When do you NEED Deep Learning? Basic ML is enough for structured/tabular data — predicting house prices from area, rooms, zip code (just numbers in a table). Deep Learning is needed for unstructured data — recognising faces in photos (raw pixels) or writing a poem in Shakespeare's style (complex language patterns). Rule of thumb: raw, unstructured data → Deep Learning.

💡 Tip Remember the one-liner: "All Machine Learning is AI, but not all AI is Machine Learning." A 1980s expert system full of if-else rules is AI but is not ML — it never learned from data.

1.3 Types of Learning

ML is divided into four learning paradigms by what kind of data the model is trained on.

1. Supervised Learning — "learning with a teacher"

Supervised Learning — the model is trained on labelled data, where both the inputs (features) and the correct outputs (labels) are provided. It learns the mapping input → output.

Like a student given a textbook with the answers in the back. Most common approach in practice. Both Classification and Regression are types of supervised learning.

Example: Email spam detection — thousands of emails each labelled spam / not-spam.
Real case: Credit-card fraud detection — labelled transactions (normal vs fraudulent) train a model to flag suspicious patterns in real time.

2. Unsupervised Learning — "finding hidden patterns"

Unsupervised Learning — the model is given unlabelled data (only inputs, no correct answers) and must discover hidden structure on its own.

Like exploring a new city with no map or guide. The main task here is Clustering.

Example: Customer segmentation — group shoppers into "budget", "premium", "occasional" buyers without pre-defined categories.
Real case: Netflix clusters users with similar viewing tastes to power recommendations.

3. Semi-Supervised Learning — "best of both worlds"

Semi-Supervised Learning — uses a small amount of labelled data combined with a large amount of unlabelled data.

Labelling is expensive — imagine 10,000 chest X-rays where a doctor charges $100 to label each one. Solution: the computer first clusters similar X-rays, the doctor labels just a few per cluster, and those labels are propagated to the rest. This dramatically cuts labelling cost while keeping good accuracy.

Example: Google Photos — you name a face once, and it finds that person in thousands of other photos automatically.

4. Reinforcement Learning — "learning by trial and error"

Reinforcement Learning (RL) — an agent learns optimal behaviour by interacting with an environment, receiving rewards for good actions and penalties for bad ones.

No dataset is given up front. A robot dog put in a room: walks forward → +10 reward; hits a wall → −10 penalty. After enough trials it learns to avoid walls. It generates its own data through rapid trial and error.

Real case: DeepMind's AlphaGo learned Go by playing millions of games against itself, guided only by win/lose rewards, and beat the world champion.

🔑 Quick recall table — memorise this Supervised = labelled data (teacher). Unsupervised = no labels (explorer). Semi-supervised = few labels + many unlabelled. Reinforcement = rewards & penalties (trial & error).

1.4 Types of ML Tasks

While "types of learning" is about the data, "types of tasks" is about what kind of answer the model produces.

Classification — predicting categories

Output is one of a fixed, finite set of discrete classes/labels.

Spam or Not-Spam · Disease or No-Disease · Cat / Dog / Bird
Real case: Medical diagnosis systems predicting a COVID test result from symptoms.

Regression — predicting continuous values

Output is a number on a continuous scale — infinitely many possible values within a range.

House price · Tomorrow's temperature · Car mileage
Real case: Uber surge pricing — predicting a demand-based fare multiplier in real time.

Clustering — grouping similar data

Automatically discovering natural groupings in data with no predefined categories. This is an unsupervised task.

Grouping customers by behaviour · Organising documents by topic.

	Classification	Regression
Output type	Discrete category / class	Continuous numerical value
Possible outcomes	Finite (Yes/No, A/B/C)	Infinite within a range
Decision aid	Decision boundaries separate classes	A best-fit line/curve
Examples	Cat/Dog, Pass/Fail, Spam	£250,000 · 23.5°C · 45.2 mph

🧩 Insight from the worksheet — Regression first, then thresholding In the student-marks worksheet, Task 2 predicted an exact score (regression). Task 1 predicted Pass/Fail (classification). In real systems regression often happens first; classification is obtained by applying a threshold: predicted marks = 68 → if marks ≥ 50 → PASS. Logistic Regression (Lecture 4) works exactly this way.

⚠️ Common exam trap "Why can't classification predict stock prices?" — because stock prices are continuous values that vary infinitely (e.g. $142.37). Classification only predicts a finite set of labels. To output any exact number you need regression.

A first taste of code

You will not be asked to build a full model in Lecture 1, but recognising the shape of an ML program helps. This is the universal scikit-learn pattern used in almost every later lecture:

Python · the universal ML pattern

# 1. Import a model
from sklearn.linear_model import LinearRegression

# 2. Prepare data:  X = inputs (features),  y = correct answers (labels)
X = [[800], [1000], [1200], [1500], [1800]]   # house sizes (sq ft)
y = [40, 60, 65, 75, 110]                     # prices (lakhs)

# 3. Create the model and LET IT LEARN the rule from data
model = LinearRegression()
model.fit(X, y)              # <-- this is "learning"

# 4. Use the learned rule to predict a NEW, unseen case
print(model.predict([[1400]]))

Output[78.75]

Notice: we never wrote a formula linking size to price. We gave examples; .fit() discovered the rule; .predict() applied it. That is machine learning in four lines.

? Practice Questions

Pick your answer, then press Check Answer. For coding questions press Show Solution. These mirror the MCQ + coding style of the real exam.

MCQQ1ML vs Programming

In Machine Learning, what does the computer produce as output during training?

A The input data
B The rules / model (the learned logic)
C Hard-coded if-else statements written by the programmer
D A compiled executable file

Answer: B

In traditional programming the human writes the rules and the computer produces answers. In ML it is flipped: the human supplies data + answers, and the computer produces the rules (the model). That model is then used to predict new cases.

MCQQ2AI/ML/DL

Which statement is correct?

A All AI is Machine Learning
B All Deep Learning is Machine Learning, and all Machine Learning is AI
C Deep Learning and Machine Learning are completely separate fields
D AI is a subset of Deep Learning

Answer: B

They are nested: DL ⊂ ML ⊂ AI. A 1980s rule-based expert system is AI but not ML, so "all AI is ML" (option A) is false.

MCQQ3Deep Learning

The biggest practical advantage of Deep Learning over classic ML is that Deep Learning:

A Always runs faster on small datasets
B Automatically extracts features from raw, unstructured data
C Never needs any training data
D Cannot make mistakes

Answer: B

Classic ML needs hand-engineered features. Deep Learning learns features itself from raw input (pixels, text), which is why it dominates images and language. It typically needs more data, not less.

MCQQ4Types of Learning

Which type of learning uses rewards and penalties?

A Supervised Learning
B Unsupervised Learning
C Reinforcement Learning
D Semi-Supervised Learning

Answer: C

Reinforcement Learning's agent learns optimal behaviour through trial and error, receiving rewards for good actions and penalties for bad ones (e.g. AlphaGo).

MCQQ5ML Tasks

A company wants to group similar customers but has no labels. Which ML task fits?

A Classification
B Regression
C Clustering
D Reinforcement Learning

Answer: C

With no predefined categories or labels, clustering (an unsupervised task) discovers natural customer segments from behavioural patterns.

MCQQ6Classification vs Regression

Predicting the exact temperature for tomorrow (e.g. 23.5°C) is an example of:

A Classification
B Regression
C Clustering
D Reinforcement Learning

Answer: B

Temperature is a continuous numerical value — that is regression. If instead you predicted "Hot / Mild / Cold" categories, it would become classification.

MCQQ7Semi-Supervised

Why is semi-supervised learning useful?

A It needs no data at all
B It uses a few expensive labels plus lots of cheap unlabelled data, cutting cost
C It only works for image data
D It guarantees 100% accuracy

Answer: B

Labelling is expensive (e.g. a doctor labelling X-rays). Semi-supervised learning labels a small set, then leverages a large unlabelled set, reducing human cost while keeping good accuracy.

Short AnswerQ8Concept

Explain in 2–3 sentences why a hand-written if-else spam filter eventually fails, and how ML solves the problem.

Model answer

A rule-based filter needs a separate rule for every scam phrase, link pattern and sender — impossible to maintain, and spammers constantly change tactics so the fixed rules go stale. ML instead learns patterns in suspicious words, links and sender behaviour from labelled examples, and can be retrained on fresh data, so it adapts automatically without anyone rewriting rules.

CodingQ9ML pattern

Using scikit-learn, train a model that learns the rule y = 2x from the data X = [[1],[2],[3],[4]], y = [2,4,6,8], then predict the value for x = 10. Identify whether this is classification or regression.

Solution

The output (a continuous number) makes this a regression task.

Python

from sklearn.linear_model import LinearRegression

X = [[1], [2], [3], [4]]   # inputs (features)
y = [2, 4, 6, 8]           # labels  (this is SUPERVISED data)

model = LinearRegression()
model.fit(X, y)            # learns the rule  y = 2x

print("Prediction for x=10:", model.predict([[10]]))

OutputPrediction for x=10: [20.]

Because both inputs and correct answers are provided, this is supervised learning; because the output is a continuous number, the task is regression.

Short AnswerQ10Synthesis

Classify each scenario by type of learning AND type of task: (a) predicting a house's price from its area, (b) splitting 10,000 news articles into topic groups with no labels.

Model answer

(a) Price from area uses labelled examples (area → known price) → Supervised learning; the output is a continuous number → Regression.

(b) News articles with no labels, grouped by similarity → Unsupervised learning; the task of forming natural groups is Clustering.

🎯 Lecture 1 — must-remember for the exam (1) Traditional programming = rules in, answers out; ML = data+answers in, rules out. (2) DL ⊂ ML ⊂ AI. (3) Four learning types: Supervised / Unsupervised / Semi-Supervised / Reinforcement. (4) Three tasks: Classification (discrete), Regression (continuous), Clustering (unsupervised grouping).

← Back

Home & Study Guide

Data Preprocessing