REPL and Multi-Turn Conversation#
CH02 / CH03 finished the tooling story, but the agent is still stuck in one-shot mode — one conversation runs to completion and exits:
- The user poses a task
- The Executor and the Model trade a few rounds of
tool_use/tool_result - The Model returns
end_turnand a final answer, and the program exits
The user can’t ask a follow-up, the contents of this conversation aren’t kept around, and the next time the program starts everything begins from scratch.
This chapter introduces the REPL (Read-Eval-Print Loop): a “read in -> execute -> print result” loop that upgrades the agent from one-shot to continuous conversation — the same program keeps running, the user can ask questions back to back, and the agent remembers what was said earlier.
4.1 REPL#
A REPL is just a loop with four steps that repeat until the user exits:
| REPL step | Action |
|---|---|
| (1) READ | Read one line from the user |
| (2) EVAL | Run one Agent loop (CH01 1.5: tool_use <-> tool_result back and forth until end_turn) |
| (3) PRINT | Print the Model’s final text answer to the terminal |
| (4) LOOP | Go back to (1) and wait for the next line — this step is absent from one-shot, which exits after a single run |
In code it’s just a while True wrapping (1)(2)(3), with the loop naturally returning to (1):
| |
Slash Commands#
The REPL can be designed so that the user, in addition to asking the Model questions, can also type “commands” — usually prefixed with / to distinguish them from regular messages. The REPL acts on these commands directly (e.g. clearing the conversation, exiting the program) and does not forward them to the Model:
user input
|- "/exit", "/quit" -> break out of the loop
|- "/reset" -> clear messages
|- "" (empty string) -> ignore, ask again
`- anything else -> feed to agent.chat(), print the replyImplementation-wise, it’s just a command-dispatch branch placed before agent.chat() to intercept these inputs:
| |
Keeping the / command slot open makes it trivial to add /save, /load, /tokens, /compact later — just drop another if user_in == "/save": ... branch in before agent.chat().
But this REPL skeleton is still missing one piece — between turns, how does the agent “remember” what was said before? That’s the next section.
4.2 Memory Across Turns#
A REPL lets the agent “remember” earlier turns by continuously accumulating the same messages list across turns. The Model itself has no memory (CH01 1.4), but as long as every call to the Model resends the full accumulated messages, the Model “appears to remember.”
How Messages Accumulate Across Turns#
1st chat("My name is Alex"):
messages: [
user "My name is Alex"
assistant "Got it, I'll remember."
]
2nd chat("What's my name?"):
messages: [
user "My name is Alex" <- kept
assistant "Got it, I'll remember." <- kept
user "What's my name?" <- newly added
assistant "Your name is Alex" <- newly added
]On the second chat() call, messages isn’t cleared — every earlier turn is still in the list. The Model sees the full history, so it can answer “Your name is Alex.”
This isn’t real memory; it’s the illusion produced by resending the history every turn (covered in CH01 1.4). The longer the conversation, the larger the history, and the worse the token blow-up — CH05 Short-Term Memory tackles this later.
Implementation: Wrapping the Messages List in an Agent Class#
The question is — where does this messages list live so it survives across chat() calls? The crudest option is a global variable everyone can touch:
| |
It works, but has two problems: (1) you can’t run two independent agents at once (they’d share the same messages and get tangled), and (2) testing is awkward — every test case has to remember to reset the global by hand.
The clean fix is to wrap messages inside an Agent class, one copy per instance:
| |
The REPL side only needs agent = Agent() once, and every subsequent agent.chat(user_in) shares the same self.messages. Want multiple agents? Just agent_a = Agent(); agent_b = Agent() — they don’t interfere with each other.
4.3 Event Loop Considerations for Interactive UIs#
The REPL looks straightforward to write, but once MCP enters the picture the entire program goes async, and that collides with synchronous I/O like input().
Problem: MCP Forces the Whole Program to Be Async#
The MCP SDK is async-first. Once Agent.chat() becomes async (because it needs to await an MCP call), main has to become async too, and you spin it up with asyncio.run():
| |
Sub-problem: Blocking input() Hangs the Event Loop#
input("you> ") is blocking synchronous I/O. Calling it directly inside an async function freezes the entire event loop, which means MCP session background tasks freeze with it.
Solution: Move Blocking I/O to a Thread Pool#
| |
asyncio.to_thread hands the synchronous function off to a thread pool while the main event loop keeps running — the MCP session, other async tasks, and Ctrl+C all stay alive.
For the fundamentals of async / await, see CH17: Async Programming with async / await.
4.4 Try It#
Glue the three previous sections together and run it: the 4.1 REPL skeleton + command interception, the 4.2 Agent class wrapping messages, and the 4.3 async-safe input. End-to-end it looks like this:
| |
The last line proves the multi-turn conversation has memory — the Model can see the earlier turns.
Recap#
By now you should understand:
- What a REPL really is — a
whileloop running READ / EVAL / PRINT / LOOP, where LOOP is the step one-shot lacks - REPL dispatch design —
/commands vs. routing to the agent, with room to grow - Multi-turn memory — not real memory, but the illusion of a
messageslist that accumulates across the REPL session and gets resent every turn - Where the state lives — promote
messagesfrom function-local or global state to an attribute on anAgentinstance: clean, and lets you run multiple agents - Async contagion — once MCP enters, the whole program has to go async; blocking
inputgets pushed to a thread pool viaasyncio.to_thread
The next three chapters tackle three real-world problems: conversations that grow past the context window (CH05 Short-Term Memory), losing the conversation when the program shuts down (CH06 Medium-Term Memory), and failing to remember facts across sessions (CH07 Long-Term Memory).