Agentic AI — Components
From a chatbot that talks to a "digital intern" that acts. Learn the building blocks of an AI agent — the LLM brain, tools, memory, planning — and closed vs open-source models.
In this lecture
18.1 From Chatbots to Agents
18.2 The Brain, Hands & Instructions
Every agent is built from three pillars:
| Pillar | What it is | Role |
|---|---|---|
| The Brain | The LLM | Reasoning, language understanding, decision-making — the "common sense" that figures out how to solve a problem |
| The Hands | Tools | Capabilities that let the agent interact with the world — search files, run code, call APIs |
| The Instructions | Prompts | The logic defining the agent's purpose, behaviour and workflow |
The Brain — an LLM is a probabilistic prediction engine
The LLM brain is a "next-word predictor", not a sentient mind. It has a training phase (learns patterns by minimising prediction error on huge text data) and a decoding phase (generates text by producing a probability distribution over the next token). Decoding strategies: greedy search (always pick the most likely word) or stochastic sampling (get creative, controlled by temperature/top-p).
Why a "naked" LLM is not enough
18.3 Tools & the Agentic Loop
Common tools: web search (real-time info), calculator (precise maths), API requests (weather, stock prices, email), code interpreter (run code, analyse data).
The agentic workflow loop
Example: "Is it raining in London?" → the LLM reasons "I need real-time data" → selects the Weather tool → the system executes it and returns "Rainy, 12°C" → the LLM synthesises "Yes, it is raining in London." The hallmark of a true agent is that moment when the model stops generating text and instead writes a command to call a tool.
from langchain_core.tools import tool
# 1. Define a tool - a normal function with a docstring
@tool
def multiply(a: int, b: int) -> int:
"""Multiply two numbers together."""
return a * b
# 2. Bind the tool to the LLM (the "calculator" is handed over)
llm_with_tools = llm.bind_tools([multiply])
# 3. The LLM does not answer directly - it requests the tool
response = llm_with_tools.invoke("Calculate 50 times 173")
print("Tool call:", response.tool_calls)
Notice the LLM did not compute the answer itself — it recognised it needs the tool and produced a structured tool call. The system then runs multiply(50, 173) = 8650 and feeds the result back.
18.4 Memory & Planning
Memory Management
| Type | Analogy | Holds | Lifespan |
|---|---|---|---|
| Short-term memory | Sticky note | Immediate info for the current task (e.g. "flight AA123 on March 15") | The current session/task |
| Long-term memory | Filing cabinet | Permanent info across sessions (e.g. "user prefers aisle seats") | Permanent — stored in vector databases |
Short-term memory handles the "what" of the current task; long-term memory stores the "who" and user preferences for personalised experiences.
Planning & Goal Decomposition
Example — "Plan my vacation" decomposes into: (1) Find flights → (2) Book hotel → (3) List restaurants. Planning makes impossible tasks possible, enables step-by-step progress tracking, and allows error recovery at each step.
18.5 Closed vs Open-Source LLMs & Ollama
| Closed Source ("Walled Garden") | Open Source ("Community Park") | |
|---|---|---|
| Examples | GPT (OpenAI), Gemini (Google), Claude | LLaMA (Meta), Gemma (Google), Mistral |
| Pros | Top reasoning benchmarks, managed infrastructure, regular updates | Free to run, privacy (runs locally), customisable & transparent |
| Cons | Paid API, your data leaves your network, "black box" | Needs local hardware (RAM/GPU), manual setup |
Why go open source?
- Complete data privacy — prompts and data never leave your device.
- Security & compliance — essential for HIPAA (healthcare), finance, enterprise secrets.
- Offline capability — works without internet once downloaded.
- Cost control — no per-token API fees; predictable fixed costs.
Ollama — the "MP3 player" for AI models
| Ollama command | What it does |
|---|---|
ollama pull llama4 | Downloads the model to your machine |
ollama run llama4 | Loads the model into memory and starts a chat session (auto-pulls if missing) |
ollama list | Shows all models installed in your library |
ollama rm llama4 | Deletes a model to free disk space |
ollama ps | Shows which models are currently running & using RAM |
ollama pull just downloads the model (like downloading an app). ollama run downloads if needed AND starts the chat (boots up the agent). If you only want the files for later, use pull.
Agent components and Ollama commands are common exam material.
The key difference between Generative AI and Agentic AI is that Agentic AI:
Generative AI is a "thinker" that produces content. Agentic AI is a "doer" — it reasons, plans, and uses tools to act on the world.
In the "digital intern" framework, what plays the role of the Brain?
The LLM is the Brain (reasoning); Tools are the Hands; Prompts are the Instructions.
Why does an agent need a calculator tool for "What is 2347 × 8563?"
LLMs are linguistic engines, not computational ones — they often give approximate, wrong arithmetic. A calculator tool gives exact results.
When an LLM with tools decides it needs a tool, it produces:
The model pauses generation and emits a tool call (name + arguments). The system runs the tool and feeds the result back to the LLM.
An agent permanently remembering "this user prefers vegetarian meals" is using:
Long-term memory (the "filing cabinet", often a vector DB) stores permanent preferences across sessions. Short-term memory is just for the current task.
Goal decomposition / planning means the agent:
Planning splits a goal (e.g. "plan my vacation") into manageable steps (find flights → book hotel → list restaurants), enabling progress tracking and error recovery.
Which is an open-source / open-weight LLM family?
LLaMA (Meta), Gemma (Google) and Mistral are open-weight models you can run locally. GPT, Gemini and Claude are closed-source/proprietary.
Which Ollama command downloads a model AND immediately starts a chat session?
ollama run loads the model into memory and starts the chat (auto-pulling it first if missing). ollama pull only downloads.
A major reason a hospital might choose an open-source LLM run locally is:
Local open-source models keep prompts and data on-device — essential for HIPAA and other regulations. Closed APIs send data to a third party.
Using LangChain, define a tool that returns the length of a string, and bind it to an LLM.
from langchain_core.tools import tool
@tool
def string_length(text: str) -> int:
"""Return the number of characters in a string."""
return len(text)
# Hand the tool to the LLM
llm_with_tools = llm.bind_tools([string_length])
response = llm_with_tools.invoke("How long is the word 'agent'?")
print(response.tool_calls)
The @tool decorator + docstring tell the LLM what the tool does and when to call it.
List the four main limitations of a "naked" LLM (without tools) and explain how tools address them.
(1) Training cutoff — no real-time info; a web-search tool fetches current data. (2) Hallucinations — it invents facts; tools/RAG return verified data. (3) No actions — it only outputs text; API tools let it send emails, query databases. (4) Poor maths — it predicts tokens, not computes; a calculator tool gives exact results.
Describe the agentic workflow loop for the request "What's the weather in Tokyo?".
User request → "What's the weather in Tokyo?". LLM reasoning → the model recognises it needs real-time data it does not have. Tool selection → it generates a call to the Weather tool with location = Tokyo. Execution → the system runs the tool and returns raw data (e.g. "Sunny, 22°C"). Synthesis → the LLM turns that into a natural reply: "It is currently sunny and 22°C in Tokyo."
pull, run, list, rm).