REPL and Multi-Turn Conversation#

CH02 / CH03 finished the tooling story, but the agent is still stuck in one-shot mode — one conversation runs to completion and exits:

The user poses a task
The Executor and the Model trade a few rounds of tool_use / tool_result
The Model returns end_turn and a final answer, and the program exits

The user can’t ask a follow-up, the contents of this conversation aren’t kept around, and the next time the program starts everything begins from scratch.

This chapter introduces the REPL (Read-Eval-Print Loop): a “read in -> execute -> print result” loop that upgrades the agent from one-shot to continuous conversation — the same program keeps running, the user can ask questions back to back, and the agent remembers what was said earlier.

4.1 REPL#

A REPL is just a loop with four steps that repeat until the user exits:

REPL step	Action
(1) READ	Read one line from the user
(2) EVAL	Run one Agent loop (CH01 1.5: `tool_use` <-> `tool_result` back and forth until `end_turn`)
(3) PRINT	Print the Model’s final text answer to the terminal
(4) LOOP	Go back to (1) and wait for the next line — this step is absent from one-shot, which exits after a single run

In code it’s just a while True wrapping (1)(2)(3), with the loop naturally returning to (1):

1
2
3
4
5
while True:
    user_in = input("you> ")          # (1) READ: read user input
    reply = agent.chat(user_in)       # (2) EVAL: run one Agent loop
    print(reply)                      # (3) PRINT: print the Model's final answer
                                      # (4) LOOP: while True returns to (1) automatically

Slash Commands#

The REPL can be designed so that the user, in addition to asking the Model questions, can also type “commands” — usually prefixed with / to distinguish them from regular messages. The REPL acts on these commands directly (e.g. clearing the conversation, exiting the program) and does not forward them to the Model:

user input
   |- "/exit", "/quit"   -> break out of the loop
   |- "/reset"           -> clear messages
   |- "" (empty string)  -> ignore, ask again
   `- anything else      -> feed to agent.chat(), print the reply

Implementation-wise, it’s just a command-dispatch branch placed before agent.chat() to intercept these inputs:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
while True:
    user_in = input("you> ").strip()

    # Command interception — never sent to Model
    if user_in in ("/exit", "/quit"):
        break
    if user_in == "/reset":
        agent.messages.clear()
        print("(messages cleared)")
        continue
    if not user_in:                       # ignore empty input
        continue

    # Regular message -> hand to agent
    reply = agent.chat(user_in)
    print(f"claude> {reply}")

Keeping the / command slot open makes it trivial to add /save, /load, /tokens, /compact later — just drop another if user_in == "/save": ... branch in before agent.chat().

But this REPL skeleton is still missing one piece — between turns, how does the agent “remember” what was said before? That’s the next section.

4.2 Memory Across Turns#

A REPL lets the agent “remember” earlier turns by continuously accumulating the same messages list across turns. The Model itself has no memory (CH01 1.4), but as long as every call to the Model resends the full accumulated messages, the Model “appears to remember.”

How Messages Accumulate Across Turns#

1st chat("My name is Alex"):
  messages: [
    user      "My name is Alex"
    assistant "Got it, I'll remember."
  ]

2nd chat("What's my name?"):
  messages: [
    user      "My name is Alex"          <- kept
    assistant "Got it, I'll remember."   <- kept
    user      "What's my name?"          <- newly added
    assistant "Your name is Alex"        <- newly added
  ]

On the second chat() call, messages isn’t cleared — every earlier turn is still in the list. The Model sees the full history, so it can answer “Your name is Alex.”

This isn’t real memory; it’s the illusion produced by resending the history every turn (covered in CH01 1.4). The longer the conversation, the larger the history, and the worse the token blow-up — CH05 Short-Term Memory tackles this later.

Implementation: Wrapping the Messages List in an Agent Class#

The question is — where does this messages list live so it survives across chat() calls? The crudest option is a global variable everyone can touch:

1
2
3
4
5
messages = []                        # (!) global variable

def chat(user_msg):
    messages.append({"role": "user", "content": user_msg})
    ...

It works, but has two problems: (1) you can’t run two independent agents at once (they’d share the same messages and get tangled), and (2) testing is awkward — every test case has to remember to reset the global by hand.

The clean fix is to wrap messages inside an Agent class, one copy per instance:

1
2
3
4
5
6
7
8
class Agent:
    def __init__(self):
        self.messages = []          # <- preserved across chat() calls

    def chat(self, user_msg):
        self.messages.append({"role": "user", "content": user_msg})
        # ... run the Agent loop (internally append tool_use / tool_result / assistant messages) ...
        return final_text

The REPL side only needs agent = Agent() once, and every subsequent agent.chat(user_in) shares the same self.messages. Want multiple agents? Just agent_a = Agent(); agent_b = Agent() — they don’t interfere with each other.

4.3 Event Loop Considerations for Interactive UIs#

The REPL looks straightforward to write, but once MCP enters the picture the entire program goes async, and that collides with synchronous I/O like input().

Problem: MCP Forces the Whole Program to Be Async#

The MCP SDK is async-first. Once Agent.chat() becomes async (because it needs to await an MCP call), main has to become async too, and you spin it up with asyncio.run():

1
2
3
4
5
6
7
8
async def main():
    agent = Agent()
    while True:
        user_in = input("you> ")              # (!) problem is here
        reply = await agent.chat(user_in)
        print(reply)

asyncio.run(main())

Sub-problem: Blocking input() Hangs the Event Loop#

input("you> ") is blocking synchronous I/O. Calling it directly inside an async function freezes the entire event loop, which means MCP session background tasks freeze with it.

Solution: Move Blocking I/O to a Thread Pool#

1
user_in = await asyncio.to_thread(input, "you> ")

asyncio.to_thread hands the synchronous function off to a thread pool while the main event loop keeps running — the MCP session, other async tasks, and Ctrl+C all stay alive.

For the fundamentals of async / await, see CH17: Async Programming with async / await.

4.4 Try It#

Glue the three previous sections together and run it: the 4.1 REPL skeleton + command interception, the 4.2 Agent class wrapping messages, and the 4.3 async-safe input. End-to-end it looks like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
$ python minimal_agent.py
[init] native tools: ['run_shell', 'read_file', 'write_file']
[init] mcp tools:    ['fetch']

you> run echo hello
  [native] run_shell({'command': 'echo hello'})
claude> hello

you> fetch the contents of example.com
  [mcp] fetch({'url': 'https://example.com'})
claude> This page is the IANA-reserved example domain ...

you> what did I just ask you to do?
claude> You first asked me to run echo hello, then fetched example.com.

The last line proves the multi-turn conversation has memory — the Model can see the earlier turns.

Recap#

By now you should understand:

What a REPL really is — a while loop running READ / EVAL / PRINT / LOOP, where LOOP is the step one-shot lacks
REPL dispatch design — / commands vs. routing to the agent, with room to grow
Multi-turn memory — not real memory, but the illusion of a messages list that accumulates across the REPL session and gets resent every turn
Where the state lives — promote messages from function-local or global state to an attribute on an Agent instance: clean, and lets you run multiple agents
Async contagion — once MCP enters, the whole program has to go async; blocking input gets pushed to a thread pool via asyncio.to_thread

The next three chapters tackle three real-world problems: conversations that grow past the context window (CH05 Short-Term Memory), losing the conversation when the program shuts down (CH06 Medium-Term Memory), and failing to remember facts across sessions (CH07 Long-Term Memory).

References#

asyncio.to_thread documentation
Full source: github.com/codereindeer-dev/minimal-agent