Introduction to Machine Learning
The foundation of everything else in this course — understand what Machine Learning really is, how it differs from normal programming, and the vocabulary every later lecture builds on.
In this lecture
Machine Learning (ML) is the engine underneath every Generative AI system you will study in this course — ChatGPT, image generators, AI agents. Before we can understand generative AI, we must be rock-solid on what "learning from data" actually means. This lecture builds that base.
1.1 ML vs Traditional Programming
The single most important idea in this entire course is the shift from writing rules to learning rules from data.
The two paradigms side by side
| Traditional Programming | Machine Learning | |
|---|---|---|
| Input | Rules + Data | Data + the correct Answers |
| What the human writes | Explicit if / else logic | Just collects examples |
| What the computer produces | Answers (by running the rules) | The Rules themselves (a "model") |
| Logic comes from | The programmer's brain | Patterns discovered in past data |
| New situations | Fail unless a rule exists | Generalise from learned patterns |
Why rules alone fail
Consider spam email detection. Could you write if-else rules for it? You would need a rule for every scam phrase, every suspicious link pattern, every fake sender — thousands of rules, and spammers change tactics daily. A "simple" program cannot keep up. ML instead learns patterns in suspicious words, links and sender behaviour from labelled examples, and automatically adapts as you feed it new data.
Three kinds of algorithms
The lecture also names three algorithm categories you should recognise:
- Deterministic — same input always gives exactly the same output (e.g. sorting a list).
- Non-deterministic — output can vary between runs even for the same input.
- Probabilistic — output is based on probabilities; ML models are largely probabilistic, predicting the most likely answer.
1.2 AI vs ML vs Deep Learning
These three terms are not separate technologies — they are nested inside one another, like Russian nesting dolls.
| Layer | Definition | Examples of algorithms |
|---|---|---|
| Artificial Intelligence "The Dream" | Any technique that makes a computer mimic human intelligence — including 1980s hard-coded if-else systems. | A* search, Adversarial Search, Hidden Markov Models |
| Machine Learning "The Method" | Algorithms that parse data, learn from it, and apply what they learned to make decisions. A subset of AI. | Linear Regression, Logistic Regression, K-Means, Decision Trees |
| Deep Learning "The Engine" | ML using Artificial Neural Networks with many layers ("deep" = many layers). Inspired by the brain. | Deep Neural Networks, CNNs, Transformers |
The key difference: feature engineering
This is the distinction examiners love to test:
- Classic ML needs you to hand-feed features. To predict house prices you must explicitly give it "Area", "Bedrooms", "Bathrooms".
- Deep Learning extracts features on its own. Feed it raw pixels of a house photo and it automatically discovers edges → shapes → textures → objects, with no human telling it what to look for.
if-else rules is AI but is not ML — it never learned from data.
1.3 Types of Learning
ML is divided into four learning paradigms by what kind of data the model is trained on.
1. Supervised Learning — "learning with a teacher"
Like a student given a textbook with the answers in the back. Most common approach in practice. Both Classification and Regression are types of supervised learning.
- Example: Email spam detection — thousands of emails each labelled spam / not-spam.
- Real case: Credit-card fraud detection — labelled transactions (normal vs fraudulent) train a model to flag suspicious patterns in real time.
2. Unsupervised Learning — "finding hidden patterns"
Like exploring a new city with no map or guide. The main task here is Clustering.
- Example: Customer segmentation — group shoppers into "budget", "premium", "occasional" buyers without pre-defined categories.
- Real case: Netflix clusters users with similar viewing tastes to power recommendations.
3. Semi-Supervised Learning — "best of both worlds"
Labelling is expensive — imagine 10,000 chest X-rays where a doctor charges $100 to label each one. Solution: the computer first clusters similar X-rays, the doctor labels just a few per cluster, and those labels are propagated to the rest. This dramatically cuts labelling cost while keeping good accuracy.
- Example: Google Photos — you name a face once, and it finds that person in thousands of other photos automatically.
4. Reinforcement Learning — "learning by trial and error"
No dataset is given up front. A robot dog put in a room: walks forward → +10 reward; hits a wall → −10 penalty. After enough trials it learns to avoid walls. It generates its own data through rapid trial and error.
- Real case: DeepMind's AlphaGo learned Go by playing millions of games against itself, guided only by win/lose rewards, and beat the world champion.
1.4 Types of ML Tasks
While "types of learning" is about the data, "types of tasks" is about what kind of answer the model produces.
Classification — predicting categories
Output is one of a fixed, finite set of discrete classes/labels.
- Spam or Not-Spam · Disease or No-Disease · Cat / Dog / Bird
- Real case: Medical diagnosis systems predicting a COVID test result from symptoms.
Regression — predicting continuous values
Output is a number on a continuous scale — infinitely many possible values within a range.
- House price · Tomorrow's temperature · Car mileage
- Real case: Uber surge pricing — predicting a demand-based fare multiplier in real time.
Clustering — grouping similar data
Automatically discovering natural groupings in data with no predefined categories. This is an unsupervised task.
- Grouping customers by behaviour · Organising documents by topic.
| Classification | Regression | |
|---|---|---|
| Output type | Discrete category / class | Continuous numerical value |
| Possible outcomes | Finite (Yes/No, A/B/C) | Infinite within a range |
| Decision aid | Decision boundaries separate classes | A best-fit line/curve |
| Examples | Cat/Dog, Pass/Fail, Spam | £250,000 · 23.5°C · 45.2 mph |
predicted marks = 68 → if marks ≥ 50 → PASS. Logistic Regression (Lecture 4) works exactly this way.
A first taste of code
You will not be asked to build a full model in Lecture 1, but recognising the shape of an ML program helps. This is the universal scikit-learn pattern used in almost every later lecture:
# 1. Import a model from sklearn.linear_model import LinearRegression # 2. Prepare data: X = inputs (features), y = correct answers (labels) X = [[800], [1000], [1200], [1500], [1800]] # house sizes (sq ft) y = [40, 60, 65, 75, 110] # prices (lakhs) # 3. Create the model and LET IT LEARN the rule from data model = LinearRegression() model.fit(X, y) # <-- this is "learning" # 4. Use the learned rule to predict a NEW, unseen case print(model.predict([[1400]]))
Notice: we never wrote a formula linking size to price. We gave examples; .fit() discovered the rule; .predict() applied it. That is machine learning in four lines.
Pick your answer, then press Check Answer. For coding questions press Show Solution. These mirror the MCQ + coding style of the real exam.
In Machine Learning, what does the computer produce as output during training?
In traditional programming the human writes the rules and the computer produces answers. In ML it is flipped: the human supplies data + answers, and the computer produces the rules (the model). That model is then used to predict new cases.
Which statement is correct?
They are nested: DL ⊂ ML ⊂ AI. A 1980s rule-based expert system is AI but not ML, so "all AI is ML" (option A) is false.
The biggest practical advantage of Deep Learning over classic ML is that Deep Learning:
Classic ML needs hand-engineered features. Deep Learning learns features itself from raw input (pixels, text), which is why it dominates images and language. It typically needs more data, not less.
Which type of learning uses rewards and penalties?
Reinforcement Learning's agent learns optimal behaviour through trial and error, receiving rewards for good actions and penalties for bad ones (e.g. AlphaGo).
A company wants to group similar customers but has no labels. Which ML task fits?
With no predefined categories or labels, clustering (an unsupervised task) discovers natural customer segments from behavioural patterns.
Predicting the exact temperature for tomorrow (e.g. 23.5°C) is an example of:
Temperature is a continuous numerical value — that is regression. If instead you predicted "Hot / Mild / Cold" categories, it would become classification.
Why is semi-supervised learning useful?
Labelling is expensive (e.g. a doctor labelling X-rays). Semi-supervised learning labels a small set, then leverages a large unlabelled set, reducing human cost while keeping good accuracy.
Explain in 2–3 sentences why a hand-written if-else spam filter eventually fails, and how ML solves the problem.
A rule-based filter needs a separate rule for every scam phrase, link pattern and sender — impossible to maintain, and spammers constantly change tactics so the fixed rules go stale. ML instead learns patterns in suspicious words, links and sender behaviour from labelled examples, and can be retrained on fresh data, so it adapts automatically without anyone rewriting rules.
Using scikit-learn, train a model that learns the rule y = 2x from the data X = [[1],[2],[3],[4]], y = [2,4,6,8], then predict the value for x = 10. Identify whether this is classification or regression.
The output (a continuous number) makes this a regression task.
from sklearn.linear_model import LinearRegression
X = [[1], [2], [3], [4]] # inputs (features)
y = [2, 4, 6, 8] # labels (this is SUPERVISED data)
model = LinearRegression()
model.fit(X, y) # learns the rule y = 2x
print("Prediction for x=10:", model.predict([[10]]))
Because both inputs and correct answers are provided, this is supervised learning; because the output is a continuous number, the task is regression.
Classify each scenario by type of learning AND type of task: (a) predicting a house's price from its area, (b) splitting 10,000 news articles into topic groups with no labels.
(a) Price from area uses labelled examples (area → known price) → Supervised learning; the output is a continuous number → Regression.
(b) News articles with no labels, grouped by similarity → Unsupervised learning; the task of forming natural groups is Clustering.