GenAI Exam Prep
Home Mock Exam
⚡ LECTURE 11

Generative AI Modalities

Generative AI does not just classify — it creates. Explore text, image, code, audio and multimodal models, the architectures behind them, and the ethics of generated content.

Syllabus topics 39–44 ⏱ ~24 min read 12 practice questions

11.1 What is Generative AI?

Generative AI — models that learn patterns from massive datasets and use those patterns to create new, original content — text, images, audio or code. They do not retrieve old information; they generate new outputs using learned probability patterns.

Key terminology

TermMeaning
Training DataInformation used to teach the model (books, code repos, images)
ParametersThe model's learned settings — GPT-3 has 175 billion
PromptThe input/instruction given to the model
InferenceThe model generating output — happens after training, not during it
TokenThe smallest unit of text the model handles — a word may be several tokens
Fine-tuningAdapting a pre-trained model to a specific domain
⚠️ Two common true/false traps "Inference happens during training" → False (inference is after training). "A token is always one word" → False (a word can be split into multiple tokens, e.g. "Apple" → 2 tokens).

11.2 Generative vs Discriminative Models

AspectDiscriminativeGenerative
Core functionMap input → labelLearn the data distribution to create new data
Question answered"What category is this?""What would a new example look like?"
What it learnsDecision boundariesThe full data distribution
OutputClassifications, predictionsNew text, images, audio
ExamplesLogistic Regression, SVM, Random Forest, CNNsGPT, Stable Diffusion, GANs, VAEs
MetaphorA judge who evaluates and categorisesAn artist who creates original content

11.3 Text & Code Models

Text Generation

Models learn patterns in text and predict the next token → sentences → paragraphs → documents. Powered by Transformers (Lecture 10). Used in:

Code Models

Trained on programming languages and developer patterns. They help with:

Python · text generation & summarization (Hugging Face)
from transformers import pipeline

# Text generation
generator = pipeline("text-generation", model="gpt2")
print(generator("The future of AI is", max_length=20)[0]['generated_text'])

# Text summarization
summarizer = pipeline("summarization")
long_text = "Generative AI has transformed many industries..."
print(summarizer(long_text, max_length=30, min_length=10))
OutputThe future of AI is bright and full of new possibilities for... [{'summary_text': 'Generative AI has transformed many industries.'}]

11.4 Image & Audio Models

Image Generation — Diffusion Models

Diffusion Model — generates images by starting from random noise and refining it step by step into a clear picture, guided by a text prompt. Example: Stable Diffusion.

The model is trained to reverse a noising process — it learns to remove noise gradually. Used in marketing visuals, concept art, synthetic medical-imaging data and educational illustrations. Example prompt: "A futuristic classroom with AI robots teaching students, photorealistic, detailed."

Audio Models

11.5 Multimodal Models & Architectures

Multimodal Model — a model that works across multiple types of data at once — text, images, audio, video. Example: GPT-4V can accept an image and answer questions about it in text.

Different tasks need different architectures

Content typeArchitectureHow it works
TextTransformersSelf-attention to model context across long sequences
ImagesDiffusion ModelsStart from noise, refine step-by-step, guided by text embeddings
AudioSpecialised audio modelsLearn temporal/acoustic patterns, generate natural prosody

Emerging trends

Multimodal models, personalisation, efficiency (smaller models, less compute), controllability (finer control of tone/style), and safety & alignment (outputs matching human values).

11.6 Ethics in Generative AI

PracticeWhy it matters
Verify outputsAI can hallucinate — always cross-check facts with reliable sources
Check for biasModels inherit data biases — review outputs for stereotypes
Protect privacyNever enter sensitive/personal data — public models may log and reuse inputs
Be transparentDisclose when AI tools are used; follow attribution rules
Cite synthetic contentTreat AI-generated material like any other source — transparency builds credibility
💡 Tip — what never to type into a public AI tool Passwords, API keys, customer data, medical records, financial details, or any confidential company information. Public models may store and reuse what you submit.
? Practice Questions

Modalities, architectures and terminology are tested heavily here.

MCQQ1Definition

Generative AI differs from traditional (discriminative) AI because it:

  • A Only classifies existing data into categories
  • B Creates new, original content from learned patterns
  • C Never needs training data
  • D Works only with numbers
Answer: B

Generative models learn the data distribution and produce new content (text, images, audio). Discriminative models only assign labels to existing data.

MCQQ2Discriminative

Which of these is a discriminative model?

  • A GPT
  • B Stable Diffusion
  • C Logistic Regression
  • D A GAN
Answer: C

Logistic Regression maps inputs to labels (discriminative). GPT, Stable Diffusion and GANs all generate new content (generative).

MCQQ3Image models

Image generators like Stable Diffusion create pictures by:

  • A Copying images from the internet
  • B Starting from random noise and refining it step-by-step
  • C Using a single Transformer decoder
  • D Drawing pixel by pixel from left to right
Answer: B

Diffusion models begin with noise and iteratively denoise it into a coherent image, guided by the text prompt's embeddings.

MCQQ4Terminology

When does inference happen?

  • A During training
  • B After training, when the model generates output
  • C Before any data is collected
  • D Only when fine-tuning
Answer: B

Training adjusts the parameters; inference is the later phase where the trained model produces outputs from a prompt.

MCQQ5Tokens

"A token is always exactly one word." This statement is:

  • A True
  • B False — a word can be split into several tokens
  • C True only for English
  • D True only for code models
Answer: B

Tokens are sub-word chunks. A single word like "microtransactional" or even "Apple" can be broken into multiple tokens.

MCQQ6Multimodal

A multimodal model is one that:

  • A Has multiple hidden layers
  • B Works across multiple data types (e.g. text + images)
  • C Runs on multiple computers
  • D Was trained by multiple companies
Answer: B

Multimodal = multiple modalities. GPT-4V, for instance, handles both images and text in one model.

MCQQ7Summarization

A summariser that writes a fresh summary in new words rather than copying sentences is performing:

  • A Extractive summarization
  • B Abstractive summarization
  • C Tokenization
  • D Classification
Answer: B

Abstractive summarization generates new sentences; extractive summarization picks existing sentences out of the source text.

MCQQ8Ethics

Which is the safest practice when using a public generative AI tool?

  • A Paste confidential customer data to get better answers
  • B Verify outputs against reliable sources and avoid entering sensitive data
  • C Always trust the output without checking
  • D Never disclose that AI was used
Answer: B

AI can hallucinate, so verify facts; and public models may log inputs, so never enter sensitive/personal data. Transparency about AI use is also good practice.

Short AnswerQ9Concept

Explain the difference between a generative model and a discriminative model using one example of each.

Model answer

A discriminative model learns the boundary between classes and answers "what category is this?" — e.g. Logistic Regression deciding spam vs not-spam. A generative model learns the data distribution and answers "what would a new example look like?" — e.g. GPT generating a new paragraph or Stable Diffusion creating a new image.

CodingQ10Hugging Face

Use the Hugging Face pipeline to perform sentiment analysis on the sentence "I love this course".

Solution
Python
from transformers import pipeline

classifier = pipeline("sentiment-analysis")
result = classifier("I love this course")
print(result)
Output[{'label': 'POSITIVE', 'score': 0.9998}]

The pipeline helper downloads a pre-trained model and runs inference in one line.

CodingQ11Text generation

Write code using Hugging Face to generate text continuing the prompt "Machine learning is".

Solution
Python
from transformers import pipeline

generator = pipeline("text-generation", model="gpt2")
output = generator("Machine learning is", max_length=25,
                   num_return_sequences=1)
print(output[0]['generated_text'])
OutputMachine learning is a powerful tool that allows computers to learn patterns from data and improve over time.
Short AnswerQ12Architectures

Match each modality to its typical architecture: text, images, sequences/audio.

Model answer

Text → Transformers (self-attention for long-range context). Images → Diffusion models (denoise random noise into a picture). Audio/sequences → specialised audio/sequence models that learn temporal and acoustic patterns. Different content types need architectures matched to their structure.

🎯 Lecture 11 — must-remember Generative AI creates new content; discriminative AI labels existing data. Modalities: text/code (Transformers), images (Diffusion — noise→image), audio (TTS, voice cloning, music). Multimodal = many data types. Inference = after training; a token ≠ always one word. Ethics: verify, check bias, protect privacy, be transparent.