⚡ LECTURE 14

Fine-Tuning

Permanently teach a model new behaviour. Learn how fine-tuning differs from prompting and RAG, when each is the right tool, instruction-tuning, LoRA, and how to source knowledge.

Syllabus topics 54–57 ⏱ ~23 min read 12 practice questions

In this lecture

The Optimization Spectrum
Fine-Tuning vs Prompting
Situations for Fine-Tuning
Instruction-Tuning & LoRA
Knowledge Sourcing
Practice Questions

14.1 The Optimization Spectrum

Out of the box, LLMs are generalists. To build effective applications we guide them. There are three levels of optimization:

Approach	What it is	Changes the model?
1. Prompting "In-context learning"	Giving instructions/examples in the input window	No — fastest to implement
2. RAG "Retrieval-Augmented Generation"	Injecting relevant external knowledge into the prompt	No — best for adding knowledge
3. Fine-Tuning "Weight adaptation"	Training the model on a dataset to permanently change its behaviour	Yes — best for complex behaviours

Fine-Tuning — training a pre-trained model on a dataset of examples to permanently adjust its internal parameters (weights). It is mainly for teaching behaviour, format or style — not usually for adding knowledge.

🎓 Prompting vs Fine-Tuning Prompt engineering = coaching a student for one day — temporary, flexible, good for most tasks. Fine-tuning = sending the student to medical school for four years — a permanent, deep behaviour change. You retrain on specific data and the new behaviour is "baked in".

The context window

Everything in prompting and RAG happens inside the context window — the limited amount of text the model can consider at once. It holds the system instructions, conversation history, user input and any retrieved knowledge. A huge system prompt leaves less room for the conversation.

14.2 Fine-Tuning vs Prompting (and RAG)

Need	Best approach
New knowledge (news, company data)	RAG — retrieval
A specific output format (JSON, code)	Prompting — few-shot
A consistent tone / style	Fine-Tuning
The prompt is too long / expensive	Fine-Tuning (bake instructions in)
Fixing occasional reasoning errors	Chain-of-Thought prompting

RAG vs Fine-Tuning for knowledge

Feature	Fine-Tuning	RAG (Knowledge Base)
Knowledge update	Slow — needs retraining	Instant — just update the database
Accuracy on facts	Prone to hallucination	High — grounded in retrieved facts
Citations	Difficult	Easy — direct references
Privacy	Data baked into the model	Data stays separate & controlled

⚠️ Common misconception "Fine-tuning fixes hallucinations." Not necessarily. Fine-tuning can even increase hallucinations if the training data contains errors. For factual accuracy, RAG is generally better because it grounds answers in retrieved text.

14.3 Situations for Fine-Tuning

Fine-tune when you need:

A specialised behaviour or style — e.g. teaching a model to consistently write in legal language ("legalese") or a brand voice.
A consistent output format that few-shot prompting cannot reliably enforce.
Lower cost & latency — bake long instructions into the weights so prompts can be short (saving tokens on every call).
Small-model performance — fine-tune a small model (e.g. Llama-7B) so it performs as well as a large model on one specific task.

🔑 The recommended strategy — in order 1. Start with Prompting — system prompts and few-shot examples solve ~80% of problems. 2. Add RAG — if you need external facts or up-to-date data. 3. Fine-Tune last — only when you need to reduce latency/cost or deeply bake in a behaviour.

14.4 Instruction-Tuning & LoRA

Instruction-Tuning

Instruction-Tuning — the most common form of fine-tuning today. The model is trained on pairs of (Instruction, Output) so it learns to follow instructions better. This is how raw base models become helpful assistants.

The dataset — JSONL format

Fine-tuning needs a high-quality dataset, usually in JSONL (one JSON object per line). Quality matters more than quantity.

JSONL · a fine-tuning dataset

{"messages": [{"role": "user", "content": "Refund my order"},
              {"role": "assistant", "content": "I'm sorry to hear that. Could you share your order ID?"}]}
{"messages": [{"role": "user", "content": "Where is my package?"},
              {"role": "assistant", "content": "Let me check — please provide your tracking number."}]}

For style transfer, as few as 50–100 high-quality examples can work; for complex reasoning you might need 1,000. Always: quality > quantity.

PEFT & LoRA

🔑 PEFT — Parameter-Efficient Fine-Tuning Full fine-tuning updates all 7B+ parameters — expensive and slow. LoRA (Low-Rank Adaptation) freezes the main model and trains only a tiny adapter layer (~1% of parameters). Benefits: runs on consumer GPUs, trains in hours not days, and is modular — you can swap adapters for different tasks.

Python · launching a fine-tuning job (OpenAI)

from openai import OpenAI
client = OpenAI()

# 1. Upload the JSONL training file
file = client.files.create(
    file=open("training_data.jsonl", "rb"),
    purpose="fine-tune"
)

# 2. Start the fine-tuning job
job = client.fine_tuning.jobs.create(
    training_file=file.id,
    model="gpt-4o-mini-2024-07-18"
)
print("Fine-tuning job started:", job.id)

OutputFine-tuning job started: ftjob-abc123

14.5 Knowledge Sourcing

Knowledge Sourcing — deciding where a model's knowledge comes from for a given task: its frozen pre-training, dynamically retrieved external data (RAG), or weights adapted by fine-tuning.

The knowledge gap

LLMs are frozen in time — they only know what they learned during pre-training. They do not know your private company data or news from yesterday, and they can hallucinate details about specific documents. The right knowledge source fixes this:

Pre-trained knowledge — general world knowledge; free but static and possibly outdated.
RAG (retrieved knowledge) — for facts that change or are private; instantly updatable, grounded, citable.
Fine-tuned knowledge — for deeply embedding behaviour/style; permanent but slow to update.
State-of-the-art: combine them — fine-tune a model to be excellent at using retrieved context (instruction-tuning), then use RAG to feed it the latest data.

💡 Tip — how many examples for fine-tuning? Style transfer: 50–100 high-quality examples can be enough. Complex reasoning: ~1,000. The golden rule throughout: quality beats quantity — a small clean dataset outperforms a large messy one.

? Practice Questions

Knowing when to prompt vs RAG vs fine-tune is the most-tested idea here.

MCQQ1Definition

Fine-tuning differs from prompting because fine-tuning:

A Only adds text to the input window temporarily
B Permanently updates the model's internal weights
C Requires no data at all
D Works only on images

Answer: B

Prompting is temporary, in-context guidance. Fine-tuning actually retrains the model, permanently changing its parameters and behaviour.

MCQQ2When to use

You need the model to answer using your company's latest internal documents. The best approach is:

A Fine-tuning
B RAG (retrieval)
C Increasing the temperature
D Using a larger batch size

Answer: B

For new or frequently-changing knowledge, RAG is best — it retrieves the documents at query time and is instantly updatable, unlike slow retraining.

MCQQ3When to use

Fine-tuning is the best choice when you need:

A The latest news
B A consistent tone/style, or shorter prompts to cut latency & cost
C To add a fact that changes daily
D To remove the need for any training data

Answer: B

Fine-tuning bakes in a behaviour/style and lets you shorten prompts (saving tokens). Changing facts → RAG; latest news → RAG.

MCQQ4Instruction-tuning

Instruction-tuning trains a model on:

A Random text scraped from the internet
B Pairs of (Instruction, Output) so it learns to follow instructions
C Only images and labels
D Its own previous outputs only

Answer: B

Instruction-tuning uses (instruction → desired output) pairs, teaching a base model to behave like a helpful, instruction-following assistant.

MCQQ5LoRA

LoRA (Low-Rank Adaptation) makes fine-tuning cheaper by:

A Updating every parameter at once
B Freezing the main model and training only a tiny adapter layer
C Deleting most of the training data
D Running the model on a phone

Answer: B

LoRA freezes the base weights and trains only a small adapter (~1% of parameters), so it runs on consumer GPUs, trains fast, and is modular.

MCQQ6Dataset

For fine-tuning datasets, the guiding rule is:

A Quantity always beats quality
B Quality matters more than quantity
C The data must be images
D Exactly 1 million examples are required

Answer: B

A small, clean, high-quality dataset (even 50–100 examples for style) outperforms a large messy one. Bad examples teach bad behaviour.

MCQQ7Hallucination

Does fine-tuning reliably stop hallucinations?

A Yes, always
B No — it can even increase them; RAG is generally better for factual accuracy
C Yes, but only for images
D Hallucinations are impossible after training

Answer: B

Fine-tuning teaches behaviour, not factual grounding — errors in the training data can worsen hallucinations. RAG grounds answers in retrieved text, making it better for factuality.

MCQQ8Strategy

According to the recommended optimization strategy, you should:

A Always fine-tune first
B Start with prompting, add RAG if needed, and fine-tune last
C Never use prompting
D Use RAG only for images

Answer: B

Prompting solves ~80% of problems and is fastest. Add RAG for external knowledge. Fine-tune last — only when you must reduce cost/latency or deeply bake in behaviour.

Short AnswerQ9Concept

A startup wants its chatbot to (a) always reply in a friendly brand voice and (b) answer using its constantly-updated product catalogue. Which technique for each, and why?

Model answer

(a) A consistent brand voice/style is a behaviour → fine-tuning, which permanently bakes the tone into the model. (b) A constantly-updated catalogue is changing knowledge → RAG, which retrieves the current catalogue at query time and can be updated instantly without retraining. The state-of-the-art is to combine both.

CodingQ10JSONL dataset

Write two lines of a JSONL fine-tuning dataset that teach a model to answer customer-support questions politely.

Solution

JSONL

{"messages": [{"role": "user", "content": "My order is late"},
 {"role": "assistant", "content": "I'm sorry for the delay. Could you share your order ID so I can check?"}]}
{"messages": [{"role": "user", "content": "I want a refund"},
 {"role": "assistant", "content": "I understand. I'd be happy to help - may I have your order number?"}]}

Each line is one JSON object pairing a user message with the desired assistant reply. Quality and consistency of these examples matter most.

CodingQ11Fine-tune job

Write code to upload a JSONL file and start an OpenAI fine-tuning job.

Solution

Python

from openai import OpenAI
client = OpenAI()

# Step 1: upload the dataset
training_file = client.files.create(
    file=open("data.jsonl", "rb"),
    purpose="fine-tune"
)

# Step 2: create the fine-tuning job
job = client.fine_tuning.jobs.create(
    training_file=training_file.id,
    model="gpt-4o-mini-2024-07-18"
)
print("Job ID:", job.id)

OutputJob ID: ftjob-abc123xyz

Short AnswerQ12Knowledge sourcing

Why are LLMs said to be "frozen in time", and what are the two main ways to give them new knowledge?

Model answer

An LLM only knows what was in its pre-training data up to a cutoff date — after training, its weights are fixed ("frozen"), so it does not know recent events or private data. The two ways to add knowledge are RAG (retrieve external documents at query time — instant, citable, best for changing facts) and fine-tuning (retrain the weights — permanent, slow to update, better for behaviour than facts).

🎯 Lecture 14 — must-remember Optimization spectrum: Prompting (no change, fastest) → RAG (no change, for knowledge) → Fine-tuning (changes weights, for behaviour). Fine-tuning ≈ teaching behaviour/style, not facts. RAG beats fine-tuning for factual accuracy & updates. Instruction-tuning = (instruction, output) pairs. LoRA/PEFT trains a tiny adapter (~1%). Strategy: prompt → RAG → fine-tune last. Quality > quantity.

← Previous

Prompt Engineering

Managing State in Chatbots