シナリオ一覧に戻る
シナリオ 7 / 8

Conversational AI Architecture Patterns

The setup

You are architecting a multi-turn conversational assistant for a SaaS product. Users ask questions, follow up, change topics, and occasionally trigger actions ("upgrade my plan", "summarize this thread"). The assistant runs on the Claude Messages API, must remain coherent across long conversations, sometimes calls tools, and must hand off to a human when it cannot proceed safely.

The team's first prototype "remembered" things inconsistently, sometimes lost the thread after ten turns, occasionally invented an answer rather than asking a clarifying question, and produced free-text where the UI expected structured data. Your job is to choose an architecture that is reliable, debuggable, and aligned to how Claude actually works.

The core insight: the API is stateless

The single most important fact for this scenario is that the Messages API is stateless. Claude has no server-side memory of prior turns. You maintain conversation state by sending the full messages array — alternating user and assistant turns — on every request. "Conversation memory" is an application concern, not a model feature.

This reframes the whole design. A turn looks like:

json

Note how a tool round-trip is just more messages: the assistant emits a tool_use block (with stop_reason: "tool_use"), your code executes the tool, and you append a user message containing the tool_result before calling the API again. The agentic loop continues until stop_reason is end_turn.

The right architecture

1. A single, stable system prompt. Put durable behavior — persona, escalation rules, "ask before guessing", output expectations — in the system parameter, not buried in user turns. The system prompt is your highest-leverage control surface and is identical every turn, which makes it a perfect prompt-caching boundary (cache the system block + tool definitions to cut cost and latency on long chats).

2. Explicit context management, not infinite history. As conversations grow, raw history blows the context window and dilutes attention. Apply a windowing/summarization strategy: keep the last N turns verbatim, and replace older turns with a running summary you generate and store. This is your job — Claude won't compact for you over the Messages API.

3. Structured output where the UI consumes it. When a turn must produce machine-readable data (an action payload, a routing decision), don't parse free text. Force tool_use with a JSON Schema (e.g. a route_intent or create_upgrade_request tool) so the model returns validated structure. Keep natural language for what the user reads.

4. Clarify-or-escalate, not hallucinate. Encode in the system prompt that the assistant asks one clarifying question on ambiguity and calls an escalate_to_human tool when it lacks data or authority. Human-in-the-loop is an orchestration decision, placed deliberately, not an afterthought.

Common traps

  • Stuffing memory into the system prompt. It is not a scratchpad; rebuild the messages array instead.
  • Trusting Claude to remember. No state is carried between calls — omit history and coherence collapses.
  • Parsing prose as data. Use tool_use + schema; regex over chat output is fragile.
  • Unbounded history. Without summarization you hit context limits and rising cost/latency.
  • Implicit escalation. If "when to hand off" isn't specified, the model improvises.

Domain mapping

  • Domain 1 (Agentic Architecture & Orchestration): the stateless agentic loop, stop_reason handling, tool round-trips, session/context management, and deliberate human-in-the-loop placement.
  • Domain 4 (Prompt Engineering & Structured Output): a stable system prompt as the control surface, clarify-before-guess instructions, and forcing reliable structured output via tool_use + JSON Schema.