Scenario 3: Multi-Agent Research System

The problem

A product team is building an internal "research assistant." A user asks an open-ended question — "Compare the go-to-market strategies of our three top competitors and cite sources" — and expects a thorough, cited report. A single Claude call with web search works for narrow lookups, but on broad questions it serializes the work: it searches one angle, reads, searches the next, and slowly fills its context window with raw pages. Quality plateaus, latency climbs, and the single shared context becomes a bottleneck. The team needs an architecture that can explore many angles in parallel and still produce one coherent, sourced answer.

The right architecture: orchestrator-worker

The fit-for-purpose pattern is orchestrator-worker (lead agent + subagents). A lead agent receives the query, thinks through a plan, decomposes it into independent sub-tasks, and spawns subagents — each with its own context window, its own tools, and a specific objective. Subagents investigate in parallel, return condensed findings, and the lead agent synthesizes them, then hands off to a dedicated citation agent to attach sources.

This is the architecture Anthropic published for its own research system. The key numbers worth knowing: a multi-agent system (Opus lead + Sonnet subagents) outperformed a single Opus agent by ~90% on internal research evals, and token usage alone explained ~80% of the performance variance. The flip side is cost: multi-agent runs use roughly 15x the tokens of a chat, versus ~4x for a single agent. So the pattern earns its keep only when the task value is high and the work is parallelizable.

Key decisions

Decompose, don't duplicate. The lead agent must give each subagent an explicit objective, output format, tool guidance, and clear boundaries. Vague delegation ("research the competitors") makes subagents redo each other's work. Precise delegation ("subagent A: pricing pages only; subagent B: 2024 earnings calls only") yields broad, non-overlapping coverage.
Scale effort to complexity. Bake heuristics into the lead prompt: simple fact-finding uses 1 subagent and 3-10 tool calls; a broad comparison may warrant 10+ subagents. This prevents the failure of spawning 50 subagents for a trivial question.
Pass references, not raw context (Domain 5). Subagents that dump full web pages back to the lead agent blow the lead's window and lose information across hops. Instead, have subagents write findings to external storage (an artifact/scratchpad) and return lightweight references. The lead pulls only what it needs.
Right-size the models. Use a stronger model (Opus) for the lead's planning and synthesis; use cheaper, faster models (Sonnet/Haiku) for parallel subagent retrieval. This is where most cost optimization lives.
Separate citation from research. A focused citation agent that maps claims to sources is more reliable than asking the synthesis step to cite while it writes.

Common traps

Choosing multi-agent for the wrong task. It is poor for work needing shared, tightly-coupled context (e.g., most coding tasks) or real-time coordination between agents. If subagents must constantly see each other's intermediate state, the orchestrator-worker split fights you.
Synchronous bottlenecks. Subagents typically run synchronously: the lead cannot steer them mid-flight, and one slow subagent blocks the batch. Plan around this; do not assume mid-execution course-correction.
Context overflow on long runs (Domain 5). Across hundreds of turns the lead must summarize completed phases into external memory before continuing, or it will exhaust its window.
Lost-in-the-middle synthesis. When the lead assembles many subagent reports, the most important constraints belong at the start and end of its synthesis prompt.
Unreliable failure handling. State spread across many tool calls needs durable execution and checkpoint resumption, not a full restart on every error.

How it maps to the exam

Domain 1 (Agentic Architecture & Orchestration) owns the structural choices: recognizing orchestrator-worker as the right control structure, deciding the code spawns subagents while the model plans, sizing parallelism to task complexity, and selecting the right model per role. Domain 5 (Context Management & Reliability) owns making it survive production: per-subagent context isolation, artifact/reference passing instead of raw dumps, progressive summarization into external memory, citation/provenance, and durable error recovery. Expect questions that hand you a broad research goal and ask you to either justify (or reject) the multi-agent pattern, choose how subagents return results, or fix a context-overflow failure — the cost-vs-value tradeoff is the through-line.

Exam focus

Multi-agent wins when work is broad, parallelizable, and high-value — and it costs ~15x the tokens to get there. The lead plans and synthesizes; subagents explore in isolated context and return references, not raw pages. Reject the pattern for tightly-coupled, shared-context, or low-value tasks.