Introduction to Neural Networks
Build your first "artificial brain cell". Learn how a perceptron computes a decision, why a single layer is limited, and how networks learn through gradient descent and backpropagation.
In this lecture
7.1 The Artificial Neuron
Neural networks are the engine of Deep Learning — ML with networks of many layers. They are loosely inspired by the brain.
| Biological neuron | What it does | Artificial equivalent |
|---|---|---|
| Dendrites | Receive signals from other neurons | Inputs (x) |
| Synapses | Control how strong each signal is | Weights (w) |
| Cell body / Soma | Adds up all the signals | Summation (z) |
| Axon | Sends out the final decision | Output (y) |
7.2 The Single-Cell Perceptron
The perceptron (Rosenblatt, 1958) was the first algorithmic model of a neuron. It is a tiny decision-maker that does just three steps.
Step C: if z ≥ 0 → output 1 else → output 0 Multiply inputs by weights, add the bias, then apply an activation to decide.
| Component | Role |
|---|---|
| Inputs (x) | The data features |
| Weights (w) | How important each input is — learned during training |
| Bias (b) | A threshold/offset; lets the activation shift so it need not pass through the origin |
| Weighted sum (z) | z = Σ(w·x) + b — the linear part |
| Activation function | Decides if the neuron "fires"; adds non-linearity |
z = (2 × 8) + (1 × 20) + (−15) = 16 + 20 − 15 = 21. Since z ≥ 0 → output 1 → order pizza!
From hard threshold to probability — the sigmoid activation
A plain step ("if z ≥ 0 → 1") is harsh. Often we want a probability, so we pass z through the sigmoid (from Lecture 4):
This "dimmer switch" squashes any z into (0, 1): big positive z → ≈1, big negative z → ≈0, z = 0 → 0.5.
import numpy as np
def perceptron(inputs, weights, bias):
z = np.dot(inputs, weights) + bias # weighted sum
return 1 if z >= 0 else 0 # step activation
x = [8, 20] # hunger, money
w = [2, 1] # weights
b = -15 # bias
print("Decision:", perceptron(x, w, b)) # 1 = order pizza
7.3 The Single-Layer Perceptron & its limits
A single-layer perceptron is one layer of neurons mapping inputs directly to outputs. It is fundamentally a linear classifier — it draws a single straight line (or hyperplane) to separate two classes.
The fix? Stack neurons into hidden layers — a Multi-Layer Perceptron (Lecture 8) — which can learn non-linear boundaries and solve XOR.
7.4 How a perceptron learns — Gradient Descent
The update rule
Forward pass: prediction = 0.1·44.5 + 0.2·39.3 + 0 = 4.45 + 7.86 = 12.31.
Error: 10.4 − 12.31 = −1.91 (predicted too high).
Update (learning rate 0.0001): gradient term −2(y−ŷ) = 3.82. For w₁: slope = 3.82×44.5 = 170.1, so w₁_new = 0.1 − 0.0001×170.1 = 0.083.
Re-check: new prediction ≈ 10.96 — the error shrank from −1.91 to −0.56. The network is now strictly better.
Variants of Gradient Descent
- Batch GD — uses the whole dataset for each update (stable but slow).
- Stochastic GD (SGD) — updates after each single sample (fast, noisy).
- Mini-batch GD — updates after a small batch (the practical default).
7.5 Backpropagation
The two passes of training
- Forward pass — data flows input → output; the network makes a prediction and the loss is computed.
- Backward pass (backpropagation) — the error is propagated backward; the chain rule computes the gradient ∂Loss/∂w for every weight. Gradient descent then updates the weights.
Training terminology
| Term | Meaning |
|---|---|
| Epoch | One complete pass through the entire training dataset |
| Batch size | Number of samples processed before the weights update once |
| Iteration | One weight update = Total samples ÷ Batch size (per epoch) |
from keras import models, layers
from keras.optimizers import SGD
# A simple single-layer network
model = models.Sequential()
model.add(layers.Dense(1, input_dim=2, activation='sigmoid'))
# loss measures error; SGD does gradient descent + backprop
model.compile(optimizer=SGD(learning_rate=0.01),
loss='mse', metrics=['mae'])
# epochs = full passes over the data
model.fit(X_train, y_train, epochs=10, batch_size=32)
Perceptron arithmetic and the learning process are common exam material.
In a perceptron, what do the weights represent?
Each weight scales how strongly its input influences the decision — exactly like synapse strength in a biological neuron. Weights are what the network learns.
A single-layer perceptron cannot solve the XOR problem because:
A single-layer perceptron is a linear classifier (one straight line). XOR's classes cannot be separated by any single line, so hidden layers are required.
Why is a bias term necessary in a neuron?
Without a bias the decision boundary is locked through the origin. The bias is an offset that shifts the activation left/right, giving the model the flexibility to fit real data.
What happens during training if the learning rate is set to 0?
New weight = old weight − (learning rate × gradient). If the learning rate is 0, the step size is 0, so weights never change and no learning occurs.
Backpropagation uses which mathematical rule to compute gradients?
Backpropagation applies the chain rule layer by layer, from the output backward to the input, to find how each weight affects the final loss.
One complete pass through the entire training dataset is called:
An epoch = one full pass over all training data. A batch is a subset processed before one update; an iteration is one update.
A learning rate that is far too large will most likely cause the loss to:
Too-large steps overshoot the valley of the loss landscape, so the loss bounces around or even grows. Too-small steps make training painfully slow.
A perceptron has inputs x₁=3, x₂=5; weights w₁=0.4, w₂=0.2; bias b=−2. Compute z and the step output (1 if z ≥ 0).
z = (0.4×3) + (0.2×5) + (−2) = 1.2 + 1.0 − 2.0 = 0.2. Since 0.2 ≥ 0, the step activation outputs 1.
A dataset has 5000 samples and the batch size is 100. How many iterations make up one epoch?
Iterations per epoch = total samples ÷ batch size = 5000 ÷ 100 = 50. The weights are updated 50 times in one epoch.
Write a Python function that implements a perceptron with a step activation, given a list of inputs, weights, and a bias.
def perceptron(inputs, weights, bias):
# Step A & B: weighted sum
z = bias
for x, w in zip(inputs, weights):
z += x * w
# Step C: step activation
return 1 if z >= 0 else 0
print(perceptron([3, 5], [0.4, 0.2], -2)) # z = 0.2 -> 1
print(perceptron([1, 1], [0.4, 0.2], -2)) # z = -1.4 -> 0
Modify a neuron to output a probability using the sigmoid activation instead of a hard step. Test with inputs [2, 3], weights [0.5, 0.5], bias 0.
import numpy as np
def neuron(inputs, weights, bias):
z = np.dot(inputs, weights) + bias
return 1 / (1 + np.exp(-z)) # sigmoid activation
prob = neuron([2, 3], [0.5, 0.5], 0)
print("Probability:", round(prob, 4))
print("Class:", 1 if prob >= 0.5 else 0)
z = 0.5·2 + 0.5·3 + 0 = 2.5; σ(2.5) ≈ 0.924, so the predicted class is 1.
In one or two sentences, explain the difference between the forward pass and backpropagation.
The forward pass sends data from inputs through the network to produce a prediction and compute the loss. Backpropagation then goes backward, using the chain rule to calculate how much each weight contributed to that loss (its gradient), so gradient descent can update the weights to reduce error.