Managing State in Chatbots
LLMs are stateless β they forget you instantly. Learn the memory challenges of chatbots, multi-turn conversation strategies, context management and how conversation history is passed.
In this lecture
15.1 Memory Challenges in Chatbots
You: "What about London?" β Bot: "What about London what?"
The bot has no idea the topic is weather β it never saw the first message. This is the core memory challenge.
Problems without state management
- Responses become disconnected β the bot cannot recall earlier instructions.
- Multi-step tasks break β each step is handled in isolation.
- Users must repeat information, hurting the experience.
- Logical inconsistency β decisions cannot reference prior reasoning or preferences.
15.2 Passing Conversation History
Chat APIs use a list of messages, each with a role:
- system β global behaviour, persona and rules (set once).
- user β what the human says.
- assistant β what the model previously replied.
from openai import OpenAI
client = OpenAI()
# The conversation history list = the chatbot's "memory"
messages = [
{"role": "system", "content": "You are a helpful travel assistant."}
]
def chat(user_input):
messages.append({"role": "user", "content": user_input}) # add user turn
response = client.chat.completions.create(
model="gpt-4o-mini", messages=messages) # send WHOLE history
reply = response.choices[0].message.content
messages.append({"role": "assistant", "content": reply}) # add bot turn
return reply
print(chat("What's the weather in Paris?"))
print(chat("What about London?")) # works - history was passed along
Because the history list is re-sent each time, the second question "What about London?" now has the context that the topic is weather.
15.3 Multi-Turn Conversation Strategies
Re-sending the full history works for short chats β but conversations grow, and the context window is limited (and you pay per token). Strategies to manage long multi-turn chats:
| Strategy | How it works | Trade-off |
|---|---|---|
| Full history | Send every message every time | Perfect recall, but expensive & hits the context limit |
| Windowed (buffer) memory | Keep only the last N turns | Cheap & bounded, but forgets older context |
| Summary memory | Summarise old turns into a short paragraph; keep recent turns verbatim | Compact, retains gist, but loses fine detail |
| Vector / retrieval memory | Store past turns as embeddings; retrieve only the relevant ones (RAG-style) | Scales to very long histories; needs a vector DB |
15.4 Context Management
Context management is the art of fitting the most useful information into that limited window:
- Trim β drop the oldest turns when nearing the limit.
- Summarise β compress old turns into a short summary.
- Prioritise β always keep the system prompt and the most recent/relevant turns.
- Retrieve β pull back only the relevant past messages instead of all of them.
15.5 Types of State
State management distinguishes information by how long it must survive:
| State type | Lifespan | Holds |
|---|---|---|
| Ephemeral (short-term) | One computation/turn only | Temporary variables, intermediate results β deleted after use ("sticky note") |
| Persistent (long-term) | Survives across turns & sessions | Conversation history, user preferences, validated data ("filing cabinet") |
| Shared | Across multiple components/agents | A common dictionary nodes/agents read & write |
State and memory are conceptual but heavily tested β make sure each idea is clear.
An LLM is described as "stateless". This means:
The model forgets everything after each response. Memory must be engineered by the application around it.
How does a chatbot appear to "remember" earlier messages?
Memory is simulated by passing the full message history (system + user + assistant turns) on every API call, so the model re-reads the conversation each time.
In a chat-API message list, which role sets the bot's overall behaviour and persona?
The system message defines global behaviour and persona. user = the human's input; assistant = the model's prior replies.
Keeping only the last N exchanges of a conversation is which strategy?
Window/buffer memory keeps a sliding window of the most recent N turns β cheap and bounded, but it forgets older context.
The context window of a model is:
The context window limits how much text β system prompt + history + retrieved data + new input β the model can consider at once.
A very long conversation exceeds the context window. A good context-management technique is to:
Summary memory compresses old turns into a short paragraph, freeing tokens while retaining the gist. The system prompt should always be kept.
User preferences that should survive across multiple sessions are stored in:
Persistent state (the "filing cabinet") survives across turns and sessions. Ephemeral state (the "sticky note") lasts only for the current computation.
Checkpointing is valuable because it:
Checkpointers persist state snapshots, so multi-turn workflows survive crashes/restarts and can resume without recomputing finished work.
Why does a stateless LLM still produce coherent multi-turn conversations in practice?
The LLM itself has no memory, but the surrounding application keeps a conversation-history list and re-sends the entire history (system + previous user/assistant turns) with every new request. The model re-reads that history each time, so it appears to "remember" β the memory lives in the application, not the model.
Write a Python chat() function that maintains a conversation-history list so a chatbot remembers previous turns.
from openai import OpenAI
client = OpenAI()
history = [{"role": "system", "content": "You are a helpful assistant."}]
def chat(user_input):
history.append({"role": "user", "content": user_input})
resp = client.chat.completions.create(
model="gpt-4o-mini", messages=history) # send full history
reply = resp.choices[0].message.content
history.append({"role": "assistant", "content": reply})
return reply
chat("My name is Techno.")
print(chat("What is my name?")) # remembers -> "Your name is Techno."
The history list is the chatbot's memory β appended to and re-sent on every turn.
Write a function that trims a conversation-history list so it keeps the system message plus only the last 4 turns (window/buffer memory).
def trim_history(history, window=4):
system = history[0] # always keep the system message
recent = history[1:][-window:] # keep only the last N turns
return [system] + recent
# Example
msgs = [{"role": "system", "content": "..."}] + \
[{"role": "user", "content": f"msg {i}"} for i in range(10)]
print(len(trim_history(msgs))) # 1 system + 4 recent = 5
This bounds token usage while keeping the most recent context and the persistent system prompt.