⚡ LECTURE 18

Agentic AI — Components

From a chatbot that talks to a "digital intern" that acts. Learn the building blocks of an AI agent — the LLM brain, tools, memory, planning — and closed vs open-source models.

Syllabus topics 71–74 ⏱ ~26 min read 12 practice questions

In this lecture

From Chatbots to Agents
The Brain, Hands & Instructions
Tools & the Agentic Loop
Memory & Planning
Closed vs Open-Source LLMs & Ollama
Practice Questions

18.1 From Chatbots to Agents

🧑‍💼 The Digital Intern A standard LLM (ChatGPT) is a brilliant encyclopedia — it answers questions and generates text, but it does not do anything beyond producing information. Agentic AI is a digital intern — it can reason, plan, and execute tasks using available tools. An intern has general knowledge (the Brain) but needs instructions and access to software (Tools) to be useful.

Agentic AI — an AI system that can reason, plan and act autonomously to achieve a goal, using tools to interact with the outside world. Generative AI is a "thinker" (produces text); Agentic AI is a "doer" (uses tools to execute tasks).

18.2 The Brain, Hands & Instructions

Every agent is built from three pillars:

Pillar	What it is	Role
The Brain	The LLM	Reasoning, language understanding, decision-making — the "common sense" that figures out how to solve a problem
The Hands	Tools	Capabilities that let the agent interact with the world — search files, run code, call APIs
The Instructions	Prompts	The logic defining the agent's purpose, behaviour and workflow

The Brain — an LLM is a probabilistic prediction engine

The LLM brain is a "next-word predictor", not a sentient mind. It has a training phase (learns patterns by minimising prediction error on huge text data) and a decoding phase (generates text by producing a probability distribution over the next token). Decoding strategies: greedy search (always pick the most likely word) or stochastic sampling (get creative, controlled by temperature/top-p).

Why a "naked" LLM is not enough

⚠️ The limits of the Brain alone A plain LLM cannot: access real-time info (training cutoff), verify facts (hallucinations), perform actions like sending an email (no actions — it only outputs text), or do precise maths (poor at calculation). Ask "What's 2347 × 8563?" and it gives an approximate, often wrong answer. Tools fix all of this.

18.3 Tools & the Agentic Loop

Tools — external functions the agent can "call" when it needs help beyond its training data. The Brain (LLM) decides which tool to use; the system executes it. This is the foundation of Function Calling / Tool Calling.

Common tools: web search (real-time info), calculator (precise maths), API requests (weather, stock prices, email), code interpreter (run code, analyse data).

The agentic workflow loop

User Request → LLM Reasoning → Tool Selection → Execution → Synthesis

Example: "Is it raining in London?" → the LLM reasons "I need real-time data" → selects the Weather tool → the system executes it and returns "Rainy, 12°C" → the LLM synthesises "Yes, it is raining in London." The hallmark of a true agent is that moment when the model stops generating text and instead writes a command to call a tool.

Python · giving an LLM a tool (function calling)

from langchain_core.tools import tool

# 1. Define a tool - a normal function with a docstring
@tool
def multiply(a: int, b: int) -> int:
    """Multiply two numbers together."""
    return a * b

# 2. Bind the tool to the LLM (the "calculator" is handed over)
llm_with_tools = llm.bind_tools([multiply])

# 3. The LLM does not answer directly - it requests the tool
response = llm_with_tools.invoke("Calculate 50 times 173")
print("Tool call:", response.tool_calls)

OutputTool call: [{'name': 'multiply', 'args': {'a': 50, 'b': 173}, 'type': 'tool_call'}]

Notice the LLM did not compute the answer itself — it recognised it needs the tool and produced a structured tool call. The system then runs multiply(50, 173) = 8650 and feeds the result back.

18.4 Memory & Planning

Memory Management

Type	Analogy	Holds	Lifespan
Short-term memory	Sticky note	Immediate info for the current task (e.g. "flight AA123 on March 15")	The current session/task
Long-term memory	Filing cabinet	Permanent info across sessions (e.g. "user prefers aisle seats")	Permanent — stored in vector databases

Short-term memory handles the "what" of the current task; long-term memory stores the "who" and user preferences for personalised experiences.

Planning & Goal Decomposition

Planning — breaking a complex goal into smaller, manageable, sequential steps. Like building a skyscraper one floor at a time, with a site manager coordinating.

Example — "Plan my vacation" decomposes into: (1) Find flights → (2) Book hotel → (3) List restaurants. Planning makes impossible tasks possible, enables step-by-step progress tracking, and allows error recovery at each step.

18.5 Closed vs Open-Source LLMs & Ollama

	Closed Source ("Walled Garden")	Open Source ("Community Park")
Examples	GPT (OpenAI), Gemini (Google), Claude	LLaMA (Meta), Gemma (Google), Mistral
Pros	Top reasoning benchmarks, managed infrastructure, regular updates	Free to run, privacy (runs locally), customisable & transparent
Cons	Paid API, your data leaves your network, "black box"	Needs local hardware (RAM/GPU), manual setup

Why go open source?

Complete data privacy — prompts and data never leave your device.
Security & compliance — essential for HIPAA (healthcare), finance, enterprise secrets.
Offline capability — works without internet once downloaded.
Cost control — no per-token API fees; predictable fixed costs.

Ollama — the "MP3 player" for AI models

Ollama — a lightweight runtime that lets you easily download and run open-source LLMs locally. Raw model files are heavy files of weights you cannot just "open"; Ollama is the backend "runner" that loads and manages them (CPU/GPU allocation).

Ollama command	What it does
`ollama pull llama4`	Downloads the model to your machine
`ollama run llama4`	Loads the model into memory and starts a chat session (auto-pulls if missing)
`ollama list`	Shows all models installed in your library
`ollama rm llama4`	Deletes a model to free disk space
`ollama ps`	Shows which models are currently running & using RAM

💡 Tip — pull vs run ollama pull just downloads the model (like downloading an app). ollama run downloads if needed AND starts the chat (boots up the agent). If you only want the files for later, use pull.

? Practice Questions

Agent components and Ollama commands are common exam material.

MCQQ1Agent vs chatbot

The key difference between Generative AI and Agentic AI is that Agentic AI:

A Only produces text
B Can use tools to take actions and execute tasks, not just talk
C Never uses an LLM
D Cannot reason

Answer: B

Generative AI is a "thinker" that produces content. Agentic AI is a "doer" — it reasons, plans, and uses tools to act on the world.

MCQQ2Pillars

In the "digital intern" framework, what plays the role of the Brain?

A The tools
B The LLM
C The vector database
D The user interface

Answer: B

The LLM is the Brain (reasoning); Tools are the Hands; Prompts are the Instructions.

MCQQ3Tools

Why does an agent need a calculator tool for "What is 2347 × 8563?"

A Because the LLM cannot read numbers
B Because LLMs predict tokens and are unreliable at precise calculation
C Because tools are always faster
D Because the LLM has no memory

Answer: B

LLMs are linguistic engines, not computational ones — they often give approximate, wrong arithmetic. A calculator tool gives exact results.

MCQQ4Function calling

When an LLM with tools decides it needs a tool, it produces:

A The final answer directly
B A structured tool call specifying the tool name and arguments
C An error message
D Nothing

Answer: B

The model pauses generation and emits a tool call (name + arguments). The system runs the tool and feeds the result back to the LLM.

MCQQ5Memory

An agent permanently remembering "this user prefers vegetarian meals" is using:

A Short-term memory
B Long-term memory
C No memory
D A tool call

Answer: B

Long-term memory (the "filing cabinet", often a vector DB) stores permanent preferences across sessions. Short-term memory is just for the current task.

MCQQ6Planning

Goal decomposition / planning means the agent:

A Answers the whole task in one giant step
B Breaks a complex goal into smaller sequential steps
C Deletes its memory
D Refuses hard tasks

Answer: B

Planning splits a goal (e.g. "plan my vacation") into manageable steps (find flights → book hotel → list restaurants), enabling progress tracking and error recovery.

MCQQ7Open source

Which is an open-source / open-weight LLM family?

A GPT-4
B Gemini
C LLaMA
D Claude

Answer: C

LLaMA (Meta), Gemma (Google) and Mistral are open-weight models you can run locally. GPT, Gemini and Claude are closed-source/proprietary.

MCQQ8Ollama

Which Ollama command downloads a model AND immediately starts a chat session?

A ollama pull
B ollama list
C ollama run
D ollama rm

Answer: C

ollama run loads the model into memory and starts the chat (auto-pulling it first if missing). ollama pull only downloads.

MCQQ9Privacy

A major reason a hospital might choose an open-source LLM run locally is:

A It always has the highest benchmark scores
B Data privacy & compliance — sensitive patient data never leaves the network
C It needs no hardware
D It cannot hallucinate

Answer: B

Local open-source models keep prompts and data on-device — essential for HIPAA and other regulations. Closed APIs send data to a third party.

CodingQ10Define a tool

Using LangChain, define a tool that returns the length of a string, and bind it to an LLM.

Solution

Python

from langchain_core.tools import tool

@tool
def string_length(text: str) -> int:
    """Return the number of characters in a string."""
    return len(text)

# Hand the tool to the LLM
llm_with_tools = llm.bind_tools([string_length])

response = llm_with_tools.invoke("How long is the word 'agent'?")
print(response.tool_calls)

Output[{'name': 'string_length', 'args': {'text': 'agent'}, 'type': 'tool_call'}]

The @tool decorator + docstring tell the LLM what the tool does and when to call it.

Short AnswerQ11Concept

List the four main limitations of a "naked" LLM (without tools) and explain how tools address them.

Model answer

(1) Training cutoff — no real-time info; a web-search tool fetches current data. (2) Hallucinations — it invents facts; tools/RAG return verified data. (3) No actions — it only outputs text; API tools let it send emails, query databases. (4) Poor maths — it predicts tokens, not computes; a calculator tool gives exact results.

Short AnswerQ12Agentic loop

Describe the agentic workflow loop for the request "What's the weather in Tokyo?".

Model answer

User request → "What's the weather in Tokyo?". LLM reasoning → the model recognises it needs real-time data it does not have. Tool selection → it generates a call to the Weather tool with location = Tokyo. Execution → the system runs the tool and returns raw data (e.g. "Sunny, 22°C"). Synthesis → the LLM turns that into a natural reply: "It is currently sunny and 22°C in Tokyo."

🎯 Lecture 18 — must-remember Agent = "digital intern": Brain (LLM) + Hands (Tools) + Instructions (Prompts). LLM alone fails: training cutoff, hallucinations, no actions, poor maths. Tools/Function Calling fix this. Loop: Request → Reason → Select tool → Execute → Synthesise. Memory: short-term (sticky note) vs long-term (filing cabinet/vector DB). Planning = goal decomposition. Closed: GPT/Gemini/Claude; Open: LLaMA/Gemma/Mistral. Ollama runs local models (pull, run, list, rm).

← Previous

Rapid Prototyping Tools

Agentic AI - Control Flow