REPL 與多輪對話#

CH02 / CH03 把工具補齊了，但 agent 目前還停在 one-shot 模式 — 一次對話跑完就結束：

使用者提出一個任務
Executor 跟 Model 來回幾輪 tool_use / tool_result
Model 回覆 end_turn 和最終結果，程式結束

使用者沒辦法接著再問下一句，這次對話的內容也沒留下來，下次重新啟動程式一切又從零開始。

這章引進 REPL（Read-Eval-Print Loop）：一個「讀進來 → 執行 → 印結果」的迴圈，讓 agent 從 one-shot 升級成連續對話 — 同一個程式跑著不關，使用者可以連續發問，agent 也記得前面講過什麼。

4.1 REPL#

REPL 就是一個迴圈，四個步驟不斷重複，直到使用者退出：

REPL 步驟	動作
① READ	讀使用者一句話
② EVAL	跑一次 Agent loop（CH01 1.5：`tool_use` ↔ `tool_result` 來回幾輪直到 `end_turn`）
③ PRINT	把 Model 的最終文字答案印到 terminal
④ LOOP	回到 ① 等下一句 — 這步 one-shot 沒有，跑完就結束程式了

程式實作就是一個 while True 把 ①②③ 包起來、靠 loop 自然回到 ①：

1
2
3
4
5
while True:
    user_in = input("you> ")          # ① READ：讀使用者輸入
    reply = agent.chat(user_in)       # ② EVAL：跑一次 Agent loop
    print(reply)                      # ③ PRINT：印 Model 的最終回答
                                      # ④ LOOP：while True 自動回到 ①

`/` 指令#

REPL 可以設計成讓使用者在對話中，除了詢問 Model 問題之外，還可以輸入「指令」 — 通常用 / 開頭跟一般訊息區分。 REPL 根據輸入的指令執行對應的操作（例如清空對話、退出程式），不會把這些指令送給 Model：

user 輸入
   ├─ "/exit", "/quit"   → 跳出 loop
   ├─ "/reset"           → 清空 messages
   ├─ "" (空字串)        → 忽略，再問
   └─ 其他               → 餵給 agent.chat()，印回覆

實作上就是在 REPL 迴圈裡，把指令分支寫在 agent.chat() 之前攔截掉：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
while True:
    user_in = input("you> ").strip()

    # 指令攔截 — 不會送給 Model
    if user_in in ("/exit", "/quit"):
        break
    if user_in == "/reset":
        agent.messages.clear()
        print("(messages cleared)")
        continue
    if not user_in:                       # 空字串忽略
        continue

    # 一般訊息 → 餵給 agent
    reply = agent.chat(user_in)
    print(f"claude> {reply}")

留著 / 指令的設計空間，後面要加 /save、/load、/tokens、/compact 都很容易 — 在 agent.chat() 之前多加一個 if user_in == "/save": ... 分支即可。

但這個 REPL skeleton 還缺一塊 — 多輪對話之間，agent 怎麼「記得」前面講過什麼？下一節講。

4.2 多輪對話的記憶#

REPL 能讓 agent 「記得」前面講過什麼，靠的是同一個 messages list 在多輪對話之間 持續累積。 Model 本身沒有記憶（CH01 1.4），但只要每次呼叫 Model 時都把累積到當下的整段 messages 重新發給它，它就「彷彿記得」。

多輪對話的訊息累積#

第 1 次 chat("我叫 Alex")：
  messages: [
    user      "我叫 Alex"
    assistant "好，記住了"
  ]

第 2 次 chat("我叫什麼？")：
  messages: [
    user      "我叫 Alex"          ← 留著
    assistant "好，記住了"          ← 留著
    user      "我叫什麼？"          ← 新加
    assistant "你叫 Alex"           ← 新加
  ]

第二次呼叫 chat() 時 messages 沒被清空，前面所有的對話都還在 list 裡。 Model 收到完整歷史，所以能答出「你叫 Alex」。

這不是真記憶，是 每輪重發歷史 製造的假象（CH01 1.4 講過）。對話越長 → 歷史越大 → token 用量爆炸，後面 CH05 短期記憶會解決。

實作：Agent class 封裝 messages list#

問題是 — 這個 messages list 存在哪裡才能跨 chat() 呼叫保留？最暴力的做法是放成全域變數，所有人都能存取：

1
2
3
4
5
messages = []                        # ⚠️ 全域變數

def chat(user_msg):
    messages.append({"role": "user", "content": user_msg})
    ...

能跑，但有兩個問題：(1) 沒辦法同時開兩個獨立 agent（共用同一份 messages 會混在一起），(2) 測試很難，每個 case 都要記得手動 reset 全域變數。

乾淨的做法是把 messages 封裝在 Agent class 裡，每個 instance 自己一份：

1
2
3
4
5
6
7
8
class Agent:
    def __init__(self):
        self.messages = []          # ← 跨 chat() 呼叫保留

    def chat(self, user_msg):
        self.messages.append({"role": "user", "content": user_msg})
        # ⋯ 跑 Agent loop（內部 append tool_use / tool_result / assistant 訊息）⋯
        return final_text

REPL 那邊只要 agent = Agent() 一次，後面每輪 agent.chat(user_in) 共用同一個 self.messages。多開幾個 agent 就 agent_a = Agent(); agent_b = Agent() — 互不干擾。

4.3 互動式介面的 event loop 考量#

REPL 寫起來看似簡單，但 MCP 一進來整個程式 async 化，跟 input() 這種同步 I/O 會撞在一起。

問題：MCP 拖累整個程式 async 化#

MCP SDK 是 async-first 的。一旦 Agent.chat() 因為要 await MCP call 而變成 async，main 也要 async，asyncio.run() 起來：

1
2
3
4
5
6
7
8
async def main():
    agent = Agent()
    while True:
        user_in = input("you> ")              # ⚠️ 問題在這
        reply = await agent.chat(user_in)
        print(reply)

asyncio.run(main())

子問題：互動式 input 在 async 裡會卡住#

input("you> ") 是 blocking 的同步 I/O。直接在 async 函數裡呼叫 → event loop 整個凍結 → MCP session 背景任務也跟著卡。

解法：把 blocking I/O 推到 thread pool#

1
user_in = await asyncio.to_thread(input, "you> ")

asyncio.to_thread 把同步函數丟給 thread pool 執行，主 event loop 仍能跑 — MCP session、其他 async task、Ctrl+C 都還活著。

關於 async / await 的基礎概念，可參考 CH17：非同步程式 async / await。

4.4 試一下#

把前面三節組起來跑一下：4.1 的 REPL skeleton + 指令攔截、4.2 的 Agent class 封裝 messages、4.3 的 async-safe input — 完整跑起來像這樣：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
$ python minimal_agent.py
[init] native tools: ['run_shell', 'read_file', 'write_file']
[init] mcp tools:    ['fetch']

you> 跑 echo hello
  [native] run_shell({'command': 'echo hello'})
claude> hello

you> 抓 example.com 的內容
  [mcp] fetch({'url': 'https://example.com'})
claude> 這個網頁是 IANA 預留的範例網域 ⋯

you> 我剛叫你做了什麼？
claude> 你先讓我跑 echo hello，然後抓了 example.com。

最後一句證明多輪對話有記憶 — Model 看得到前面的 turn。

階段檢查點#

到這裡你應該理解：

REPL 的本質 — 一個 while 迴圈跑 READ / EVAL / PRINT / LOOP 四步驟，LOOP 那步是 one-shot 沒有的關鍵
REPL 的設計分流 — / 指令 vs 餵給 agent，留好擴充空間
多輪對話的記憶 — 不是真記憶，是 messages list 在 REPL 期間持續累積、每輪重發歷史的假象
狀態的位置 — 把 messages 從 function-local 或全域變數升級成 Agent class instance 屬性，乾淨又可多開
async 的傳染性 — MCP 一進來整個程式都得 async；blocking input 用 asyncio.to_thread 推到 thread pool

接下來三章處理三個現實問題：對話太長超出 context window（CH05 短期記憶）、關掉程式對話消失（CH06 中期記憶）、跨 session 記不住事實（CH07 長期記憶）。

參考資源#

asyncio.to_thread 文件
完整程式碼：github.com/codereindeer-dev/minimal-agent