⚡ LECTURE 15

Managing State in Chatbots

LLMs are stateless — they forget you instantly. Learn the memory challenges of chatbots, multi-turn conversation strategies, context management and how conversation history is passed.

Syllabus topics 58–61 ⏱ ~22 min read 11 practice questions

In this lecture

Memory Challenges in Chatbots
Passing Conversation History
Multi-Turn Conversation Strategies
Context Management
Types of State
Practice Questions

15.1 Memory Challenges in Chatbots

🔑 LLMs are stateless An LLM forgets you immediately after generating a response. Each API call is completely independent — the model has no built-in memory of previous turns. Any "memory" a chatbot appears to have is engineered by the application around it.

🧩 A chatbot without memory You: "What's the weather in Paris?" → Bot: "It's sunny in Paris."
You: "What about London?" → Bot: "What about London what?"
The bot has no idea the topic is weather — it never saw the first message. This is the core memory challenge.

State — the evolving snapshot of everything the chatbot needs to remember at a given moment: past messages, user preferences, intermediate results and workflow progress. State management is how the application maintains that memory across turns.

Problems without state management

Responses become disconnected — the bot cannot recall earlier instructions.
Multi-step tasks break — each step is handled in isolation.
Users must repeat information, hurting the experience.
Logical inconsistency — decisions cannot reference prior reasoning or preferences.

15.2 Passing Conversation History

🔑 The fundamental trick Because the LLM is stateless, the application re-sends the entire conversation history with every new request. The model re-reads the whole conversation each time, which simulates memory.

Chat APIs use a list of messages, each with a role:

system — global behaviour, persona and rules (set once).
user — what the human says.
assistant — what the model previously replied.

Python · maintaining a conversation manually

from openai import OpenAI
client = OpenAI()

# The conversation history list = the chatbot's "memory"
messages = [
    {"role": "system", "content": "You are a helpful travel assistant."}
]

def chat(user_input):
    messages.append({"role": "user", "content": user_input})   # add user turn
    response = client.chat.completions.create(
        model="gpt-4o-mini", messages=messages)                # send WHOLE history
    reply = response.choices[0].message.content
    messages.append({"role": "assistant", "content": reply})   # add bot turn
    return reply

print(chat("What's the weather in Paris?"))
print(chat("What about London?"))   # works - history was passed along

OutputIt's sunny in Paris. In London, it's currently rainy.

Because the history list is re-sent each time, the second question "What about London?" now has the context that the topic is weather.

15.3 Multi-Turn Conversation Strategies

Re-sending the full history works for short chats — but conversations grow, and the context window is limited (and you pay per token). Strategies to manage long multi-turn chats:

Strategy	How it works	Trade-off
Full history	Send every message every time	Perfect recall, but expensive & hits the context limit
Windowed (buffer) memory	Keep only the last N turns	Cheap & bounded, but forgets older context
Summary memory	Summarise old turns into a short paragraph; keep recent turns verbatim	Compact, retains gist, but loses fine detail
Vector / retrieval memory	Store past turns as embeddings; retrieve only the relevant ones (RAG-style)	Scales to very long histories; needs a vector DB

💡 Tip — Window Buffer Memory The most common practical strategy is the Window Buffer — keep, say, the last 5–10 exchanges. It acts like a "session notepad": enough recent context for coherent multi-turn chat, while keeping token usage bounded.

15.4 Context Management

Context Window — the maximum amount of text (in tokens) a model can process in a single request. It must hold the system prompt + conversation history + retrieved data + the new user input all together.

Context management is the art of fitting the most useful information into that limited window:

Trim — drop the oldest turns when nearing the limit.
Summarise — compress old turns into a short summary.
Prioritise — always keep the system prompt and the most recent/relevant turns.
Retrieve — pull back only the relevant past messages instead of all of them.

⚠️ Why context management matters If the conversation exceeds the context window, the oldest content is cut off — the bot "forgets" the start. A huge system prompt also leaves less room for the actual conversation. Every token counts and every token costs.

15.5 Types of State

State management distinguishes information by how long it must survive:

State type	Lifespan	Holds
Ephemeral (short-term)	One computation/turn only	Temporary variables, intermediate results — deleted after use ("sticky note")
Persistent (long-term)	Survives across turns & sessions	Conversation history, user preferences, validated data ("filing cabinet")
Shared	Across multiple components/agents	A common dictionary nodes/agents read & write

🧩 State validation — the Pizza Bot User: "I want a pizza with shoe topping." → State check: "shoe" is not a valid topping → reject & loop back: "Sorry, please choose from Cheese, Pepperoni, Mushrooms…" User: "Cheese." → State check: valid → accept & proceed. State both remembers the order and validates inputs to guide the user to success.

🔑 Persistent memory & checkpointing Persistent state is stored in databases or "checkpointers". Checkpointing saves snapshots of the state so a long conversation survives an app restart or crash, and the bot can resume exactly where it left off — without recomputing completed work.

? Practice Questions

State and memory are conceptual but heavily tested — make sure each idea is clear.

MCQQ1Stateless

An LLM is described as "stateless". This means:

A It has no parameters
B It has no built-in memory — each request is independent
C It cannot generate text
D It runs without electricity

Answer: B

The model forgets everything after each response. Memory must be engineered by the application around it.

MCQQ2History

How does a chatbot appear to "remember" earlier messages?

A The LLM stores them internally forever
B The application re-sends the conversation history with every request
C It re-trains the model after each message
D It reads the user's mind

Answer: B

Memory is simulated by passing the full message history (system + user + assistant turns) on every API call, so the model re-reads the conversation each time.

MCQQ3Roles

In a chat-API message list, which role sets the bot's overall behaviour and persona?

A user
B assistant
C system
D tool

Answer: C

The system message defines global behaviour and persona. user = the human's input; assistant = the model's prior replies.

MCQQ4Strategies

Keeping only the last N exchanges of a conversation is which strategy?

A Full-history memory
B Windowed (buffer) memory
C Summary memory
D No memory

Answer: B

Window/buffer memory keeps a sliding window of the most recent N turns — cheap and bounded, but it forgets older context.

MCQQ5Context window

The context window of a model is:

A The screen the chatbot runs in
B The maximum amount of text (tokens) it can process in one request
C The number of GPUs available
D The model's training dataset

Answer: B

The context window limits how much text — system prompt + history + retrieved data + new input — the model can consider at once.

MCQQ6Long conversations

A very long conversation exceeds the context window. A good context-management technique is to:

A Delete the system prompt
B Summarise older turns and keep recent ones verbatim
C Send the conversation in a random order
D Stop responding entirely

Answer: B

Summary memory compresses old turns into a short paragraph, freeing tokens while retaining the gist. The system prompt should always be kept.

MCQQ7State types

User preferences that should survive across multiple sessions are stored in:

A Ephemeral (short-term) state
B Persistent (long-term) state
C No state
D The system prompt only

Answer: B

Persistent state (the "filing cabinet") survives across turns and sessions. Ephemeral state (the "sticky note") lasts only for the current computation.

MCQQ8Checkpointing

Checkpointing is valuable because it:

A Makes the model larger
B Saves state snapshots so a conversation can resume after a restart or crash
C Deletes the conversation history
D Increases the temperature

Answer: B

Checkpointers persist state snapshots, so multi-turn workflows survive crashes/restarts and can resume without recomputing finished work.

Short AnswerQ9Concept

Why does a stateless LLM still produce coherent multi-turn conversations in practice?

Model answer

The LLM itself has no memory, but the surrounding application keeps a conversation-history list and re-sends the entire history (system + previous user/assistant turns) with every new request. The model re-reads that history each time, so it appears to "remember" — the memory lives in the application, not the model.

CodingQ10Conversation memory

Write a Python chat() function that maintains a conversation-history list so a chatbot remembers previous turns.

Solution

Python

from openai import OpenAI
client = OpenAI()

history = [{"role": "system", "content": "You are a helpful assistant."}]

def chat(user_input):
    history.append({"role": "user", "content": user_input})
    resp = client.chat.completions.create(
        model="gpt-4o-mini", messages=history)   # send full history
    reply = resp.choices[0].message.content
    history.append({"role": "assistant", "content": reply})
    return reply

chat("My name is Techno.")
print(chat("What is my name?"))   # remembers -> "Your name is Techno."

OutputYour name is Techno.

The history list is the chatbot's memory — appended to and re-sent on every turn.

CodingQ11Window memory

Write a function that trims a conversation-history list so it keeps the system message plus only the last 4 turns (window/buffer memory).

Solution

Python

def trim_history(history, window=4):
    system = history[0]            # always keep the system message
    recent = history[1:][-window:] # keep only the last N turns
    return [system] + recent

# Example
msgs = [{"role": "system", "content": "..."}] + \
       [{"role": "user", "content": f"msg {i}"} for i in range(10)]
print(len(trim_history(msgs)))   # 1 system + 4 recent = 5

Output5

This bounds token usage while keeping the most recent context and the persistent system prompt.

🎯 Lecture 15 — must-remember LLMs are stateless — they forget after each call. "Memory" = the app re-sends the full conversation history (system/user/assistant roles). Multi-turn strategies: full history, window/buffer, summary, vector retrieval. Context window = token limit for one request. State types: ephemeral (short-term), persistent (long-term), shared. Checkpointing saves state so chats survive restarts.

← Previous

Fine-Tuning

RAG