Core Concepts of an AI Agent#

1.1 What Is an AI Agent#

An AI Agent is the “hands and feet” of an AI Model — a layer of code sitting between the user and the Model, responsible for executing the things the Model wants to do but cannot (read files, run commands, browse the web, edit code…).

AI Model (also called LLM, Large Language Model) refers to AI models like Claude, ChatGPT, or Gemini that you can talk to in natural language. At its core it is a pure text generator — give it input, it produces output, no hands or feet. Throughout the following chapters we use the word Model to mean AI Model / LLM.

Here is an example. Suppose you want Claude to do this for you: “Read README.md and summarize it for me”. What happens if you just throw it at the Model directly?

sequenceDiagram
    actor User
    participant Model
    User->>Model: "Read README.md and summarize it for me"
    Model-->>User: "I cannot directly access your file system,<br/>please paste the content and I can summarize it..."

The Model knows what you want, but it cannot do it — it can only output text. It cannot actively read files, run commands, browse the web, or edit code — those “real actions”. Without “hands and feet” to act on its behalf, the Model can only stop at “I can’t do it, please paste it for me”.

Once you add an AI Agent, someone is there to execute what the Model wants to do:

sequenceDiagram
    actor User
    participant Agent as AI Agent
    participant Model
    User->>Agent: "Read README.md and summarize it for me"
    Agent->>Model: Send the conversation to the Model
    Model-->>Agent: "tool_use: read_file"
    Note over Agent: Execute read_file to read the file
    Agent->>Model: Send tool_result back so the Model can keep reasoning
    Model-->>Agent: "This project is..."
    Agent-->>User: "This project is..."

After we add an AI Agent, the Model still only outputs text — but this time what it writes is a tool_use marker that tells the AI Agent which tool to call. The AI Agent stands in the middle: when it sees tool_use, it actually reads the file, wraps the read result into a tool_result message and sends it back to the Model for further reasoning, and finally relays the Model’s summary to the User.

What are tool_use / tool_result? They are two fields inside the Model API’s message structure — the Model never actually executes a tool; it just writes a tool_use marker in its reply telling the AI Agent which tool to call. After the AI Agent runs the tool, it wraps the result into a tool_result message and sends it back to the Model. You will see these two terms over and over in 1.3 / 1.5.

The “hands and feet” layer is internally composed of four things, which the next four sections unpack in order:

  • 1.2 Executor — the body of the AI Agent, the code that actually performs actions
  • 1.3 Message and Role — the format the Model and Executor use to communicate
  • 1.4 Memory — the agent’s memory = a messages list, the whole history is resent every turn
  • 1.5 Agent Loop — why a single task requires the Model and Executor to bounce back and forth

1.2 Executor#

The diagram in 1.1 treats the AI Agent as a black box. If we open it up, the core component of the AI Agent is called the Executor — in later chapters we use “Executor” to refer to the layer of code inside the AI Agent that “actually does the work”. (Other names you will see in the industry: harness, scaffolding, agent runner — they all mean the same thing.)

The Model only does one thing: look at the conversation history → output the next piece of text — including a tool_use marker like “I want to call read_file”, which is also just text. The Executor’s job is to translate that marker into a real execution (read a file, run a shell command, call an API), wrap the result into a tool_result message, and send it back to the Model for continued reasoning.

sequenceDiagram
    actor User
    box AI Agent
        participant Executor
    end
    participant Model
    User->>Executor: "Read the README for me"
    Executor->>Model: Send the conversation to the Model
    Model-->>Executor: "tool_use: read_file"
    Note over Executor: Execute read_file to read the file
    Executor->>Model: Send tool_result back to the Model
    Model-->>Executor: "This project is..."
    Executor-->>User: "This project is..."
RoleLocationResponsibility
UserExternalSubmits requests and reads results. The start and end of the whole flow
ExecutorInside the AI AgentReceives the user’s message → bounces back and forth with the Model → performs real actions (file system, network, shell) → returns the result to the user
ModelExternal service (API)Decides what to do — answer directly? Or call a tool first? Which tool? With what arguments?

The single most important sentence: the model executes no tools; it only writes tool_use markers in its reply text. What actually reads files, runs shell commands, and calls APIs is the executor. This separation of duties is the root of every design that follows.

The Model does not know on its own which tools the executor has — it is the executor that proactively sends a “tool list” along with the conversation history to the Model. What the tool list looks like, how it is sent, and how to implement it are discussed in detail in CH02.


1.3 Message and Role#

The Executor and the Model converse through Messages.

Every Message carries at least two fields:

  • role — who is speaking (user / assistant / system)
  • content — what was said (text, a tool_use request, a tool_result…)
1
2
3
{"role": "user",      "content": "Read README"}
{"role": "assistant", "content": [<tool_use: read_file>]}
{"role": "user",      "content": [<tool_result: "# Hello...">]}

Only Three Roles#

In the Anthropic protocol, role only has three values:

RoleWritten byContent
userThe real user ⊕ the executorQuestions; or tool execution results
assistantThe modelText replies ⊕ tool call requests
systemYou (the system prompt)Behavior configuration for the model

Note: There is no tool role. Tool execution results also use the user role.

The benefit of this design is that the message protocol is extremely simple: it always alternates user / assistant, and the executor is just the “ghostwriter for user messages”.

user      "Read the README for me"   ← real user
assistant [I want to call read_file] ← model decision
user      [tool_result: "..."]       ← executor uses the user role to return tool_result
assistant "This project is..."       ← model summary

1.4 Memory#

An agent’s memory = Message history = a Messages list.
The model itself has no memory — because it does not store the Messages from any past conversation.

This is different from the ChatGPT web UI that “remembers what you asked” — the web front end already stuffs every past Message back into the model for you. Claude / GPT / any Model API by itself has no memory.

To make the model “remember” things, the executor appends every Message in the conversation (the user’s questions, tool_use, tool_result, etc.) into its own recorded messages list, and every turn sends the entire messages list to the model — not just the latest message. Only when the model sees the complete Message history does it know what to do next.

1
2
3
4
5
6
7
messages = [ # messages list
    {"role": "user", "content": "Read README"}, # Message from the first turn
    {"role": "assistant", "content": [<tool_use: read_file>]}, # Message from the second turn
    {"role": "user", "content": [<tool_result: "# Hello...">]}, # Message from the latest turn
    # Every turn from the first to the latest is sent to the model in one shot
]
response = client.messages.create(messages=messages, ...)

The cost: the longer the conversation, the more tokens are sent each turn — more expensive and slower. Coming up, CH05 / CH06 / CH07 will expand this basic memory layer into a three-tier short / medium / long term memory system to address this problem.


1.5 Agent Loop#

The diagram in 1.2 only demonstrated a single round of tool calling (read one file and then summarize). Real tasks usually require chaining several tool calls in a row:

  • “List the Python files under src/ and what each one does” → first ls src/ → then run read_file on every .py
  • “Find which piece of code handles auth” → first grep "auth" → then read_file on the matching files
  • “Fix this bug” → read_file to locate the issue → write_file to edit → run_shell to run the tests for verification

After each tool runs, the Model has to look at the result before deciding the next step. So the Executor must bounce back and forth between the Model and “real tool execution” until the Model returns end_turn:

%%{init: {'sequence': {'noteAlign': 'left'}}}%%
sequenceDiagram
    actor User
    box AI Agent
        participant Executor
    end
    participant Model
    User->>Executor: (1) Submit a request
    loop Agent loop
        Executor->>Model: (2) Send the messages list
        alt stop_reason = tool_use
            Model-->>Executor: (3) Wants to call some tool
            Note over Executor: (4) Execute the tool<br/>(5) Append tool_result to the messages list, go back to (2)
        else stop_reason = end_turn
            Model-->>Executor: (3) Final text answer
            Note over Executor: (4) Exit [Agent loop]
        end
    end
    Executor-->>User: (5) Relay the final answer to the user

The two branches in the diagram above come from the stop_reason field in the Model’s response — a “status flag” the Model attaches to every reply, telling the executor whether the current turn is “still wants to call a tool” or “already has a final answer”. The Executor looks at stop_reason to decide which path to take next:

  1. The Executor sends the entire accumulated conversation history to the Model (as 1.4 explained: every turn resends the whole messages list)
  2. stop_reason = tool_use (wants to call a tool) → the executor runs the tool the Model specified → wraps the result into a tool_result and appends it to the conversation → goes back to step 1
  3. stop_reason = end_turn (already has a final answer) → the executor extracts the Model’s text answer and returns it to the user → the loop ends

Without this loop, an AI Agent could only “call one tool and then stop” and could not handle multi-step tasks. There is only one exit condition: stop_reason = end_turn, meaning the Model believes it has enough information to answer the user.

The Full List of stop_reason Values#

Besides the tool_use and end_turn used by the loop above, stop_reason actually has other values:

stop_reasonMeaningWhat to do
end_turnThe model is done speakingExtract the text and return
tool_useThe model wants to call a toolExecute + append the result + continue the loop
max_tokensThe output got truncatedRaise max_tokens and retry
stop_sequenceHit a custom stop stringHandle as the situation requires

max_turns: Avoiding an Infinite Conversation#

Because the Executor keeps bouncing back and forth with the Model until end_turn, this loop will run forever if it never converges. Common runaway scenarios:

  • The Model gets stuck in a “call a tool → look at the result → call the same tool again” dead loop
  • The tool keeps returning errors, and the Model keeps trying new parameters but none succeed

To prevent infinite back-and-forth, the Agent Loop must add a maximum turn count limit:

1
2
3
4
for _ in range(max_turns):    # ← exceed this count and we raise, forcing termination
    response = ...
    if end_turn: return
    if tool_use: continue

Practical reference values:

  • Ordinary Q&A tasks: 10 is enough
  • Multi-step tasks: 30
  • Agentic coding (read many files, edit many places): 100+ is common

Recap#

By this point you should understand:

  • AI Agent = Model + Executor + Loop — three parts make up the minimal architecture, each with its own role
  • The Executor is the hands and feet — everything the Model wants done relies on the Executor to execute, including tool calls and state management
  • Message is the protocol — the Model and Executor communicate through a messages list, with three roles user / assistant / system, each with its own purpose
  • The Model has no memory — “remembering” is the illusion created by accumulating the messages list and resending it every turn
  • The Agent Loop is flow-controlled by stop_reasontool_use continues, end_turn wraps up; max_turns guards against infinite loops

The next chapter CH02 Providing Tools puts this chapter’s “Tool calling” architecture into practice — writing the tool definitions and implementations for the trio (run_shell / read_file / write_file) so the agent can really get to work.


References#