Anti-patterns cheatsheet

The wrong answers the exam loves to test — and why they're wrong.

Domain 1 · Domain 1 Study Guide: Agentic Architecture & Orchestration

✗Assuming the model runs the tool itself. Claude returns a tool_use block and stops; the application must execute the tool and feed back a tool_result. Treating the API as if it calls tools server-side is wrong for client-defined tools.
✗Making everything an agent. Wrapping a deterministic, known-step task in a model-driven loop adds cost, latency, and unpredictability for no benefit — a workflow is the correct choice there.
✗Treating a single retrieval-augmented LLM call as an agent. One call with context is a workflow; an agent requires the gather-act-verify loop with tool use and a termination condition.
✗Running an unbounded loop with no max_turns or budget cap. Without a guardrail the agent can spin indefinitely or burn unbounded tokens; termination must be enforced by your code.
✗Granting every agent the full tool set. Broad tool access violates least privilege and lets a research/read-only agent perform destructive writes — scope tools per agent instead.
✗Using a multi-agent system for a narrow, sequential task. Multi-agent topologies multiply token usage several-fold; they are only justified when the work is broad, parallelizable, or too large for one context window.
✗Enforcing a hard rule (such as a destructive-action block) through the system prompt instead of a hook. Prompts are guidance and can be ignored; deterministic guarantees require code-level hooks.
✗Letting an autonomous agent perform irreversible high-stakes actions with no human approval step. Lacking a HITL checkpoint for payments, deletions, or external communications is a reliability and safety failure.

Domain 2 · Domain 2: Tool Design & MCP Integration

✗Writing a one-line tool description like "gets weather" — too vague for reliable selection. The exam treats terse descriptions as the root cause of wrong-tool errors; detailed descriptions are the #1 performance lever.
✗Exposing many near-duplicate tools (create_pr, review_pr, merge_pr) instead of one consolidated tool with an action enum. Overlapping descriptions force Claude to guess non-deterministically.
✗Throwing an HTTP 500 (or returning an empty result) when a tool fails. This strands the model with no recoverable signal; the correct pattern is a tool_result with is_error true and a fix-it message.
✗Choosing "fine-tune the model" to fix tool selection. Tool behavior is controlled by descriptions, schemas, and tool_choice — fine-tuning is the wrong, expensive answer the exam plants as a distractor.
✗Calling an MCP resource an "action" or treating tools as read-only data. Resources are read-only context (app-controlled); tools are executable actions (model-controlled). Swapping these is the classic MCP distractor.
✗Believing MCP requires a proprietary protocol or a specific transport. MCP is an open standard over JSON-RPC 2.0, transport-agnostic across stdio and Streamable HTTP; the data layer is identical regardless of channel.
✗Using tool_choice "auto" when you need guaranteed structured output, or forcing a specific tool when you want the model to decide freely. Mismatching tool_choice to the task is a common wrong answer.
✗Reading entire files or editing code before reproducing the issue, instead of the Grep → Read → trace → act loop. The exam rewards targeted, evidence-first investigation over blind broad reads.

Domain 3 · Domain 3 Study Guide: Claude Code Configuration & Workflows

✗Believing splitting CLAUDE.md into many @path imports reduces context usage. Wrong: imported files load at launch and consume the same tokens; imports help structure, not budget.
✗Putting personal preferences or secrets in the committed project CLAUDE.md. Wrong: that file is shared with the whole team via git; personal preferences belong in ~/.claude/CLAUDE.md and machine-local notes in CLAUDE.local.md (gitignored).
✗Treating slash commands and Skills as interchangeable. Wrong: slash commands are explicit, user-invoked prompt templates, while Skills are model-invoked and progressively disclosed — choosing the wrong one breaks the intended trigger behavior.
✗Expecting /commands or the interactive Skill menu to work in headless -p mode. Wrong: interactive-only affordances are unavailable non-interactively; in -p mode you must describe the task in the prompt itself.
✗Using --dangerously-skip-permissions on a developer workstation to "save time." Wrong: it removes all confirmation prompts and is only appropriate inside an already-isolated, sandboxed CI runner — never as a casual local default.
✗Running large automation with --output-format text and trying to scrape it. Wrong: use --output-format json (or stream-json) so a parser like jq can read structured fields (result, session_id, cost) reliably.
✗Disabling plan mode and letting Claude edit immediately on a risky, architectural change to move faster. Wrong: plan mode exists precisely to review-before-execute and is the safe default for non-trivial work.
✗Cramming path-conditional guidance ("for files under apps/api use the repository pattern") into the global CLAUDE.md. Wrong: that bloats always-on context — path/glob-scoped guidance belongs in .claude/rules/.

Domain 4 · Domain 4 Study Guide: Prompt Engineering & Structured Output

✗Asking for "JSON only, no other text" in the prose prompt and parsing the text block. Wrong: the model can still add fences, a preamble, or trailing commentary; tool_use + forced tool_choice is the API-enforced way to get clean structure.
✗Believing a strict JSON Schema guarantees the values are correct. Wrong: a schema only enforces types/presence/enums; a string field can still hold the wrong string. Correctness needs validation, not a tighter schema.
✗Reading structured output from a text content block instead of the tool_use block’s input field. Wrong: when stop_reason is tool_use, the parsed data is in the tool_use input, not in text.
✗Re-running the exact same prompt on validation failure without telling the model what went wrong. Wrong: blind retries waste tokens; passing the concrete error back lets the model correct the specific field.
✗Using the Message Batches API for interactive, user-facing requests to "save money." Wrong: batches are asynchronous with up-to-24h turnaround — fine for offline jobs, unacceptable for real-time latency.
✗Having the same Claude call generate output AND grade its own output in one pass. Wrong: self-grading in-context rubber-stamps errors; high-stakes review needs an independent reviewer instance (LLM-as-judge).
✗Piling on more natural-language instructions to fix inconsistent formatting instead of adding few-shot examples. Wrong: for format/edge-case control, demonstrating with examples is more reliable than describing.
✗Setting tool_choice to "auto" when you require structured output, then being surprised the model replied with prose. Wrong: auto lets the model skip the tool; force the specific tool when output must be structured.

Domain 5 · Domain 5 Study Guide: Context Management & Reliability

✗"Just use the 200K context window so the model remembers everything." Wrong — recall is U-shaped (lost-in-the-middle); a large window is capacity, not a recall guarantee, and middle content is the least reliable.
✗Swallowing a tool failure and returning a default value or empty result. Wrong — it erases the distinction between "could not access" and "genuinely empty," causing the agent to act confidently on a false premise.
✗Summarizing aggressively to save tokens as the first move. Wrong — summarization is lossy and errors compound across passes; lossless trimming and subagent delegation should come first, with durable facts kept on disk.
✗Putting a timestamp, request ID, or the latest user turn near the top of the prompt to "give the model fresh context." Wrong — volatile content high in the tools->system->messages prefix invalidates the entire prompt cache.
✗Assuming caching is active without checking. Wrong — below the minimum cacheable length it silently no-ops with no error; you must confirm via cache_read_input_tokens. A changing prefix keeps reads at 0.
✗Random-sampling outputs for quality review in a high-volume pipeline. Wrong — random sampling under-represents rare, high-risk cases; use stratified sampling that over-samples risky strata.
✗Using the Message Batches API as a blocking approval gate before a high-stakes action. Wrong — batches are asynchronous and not real-time; high-stakes/irreversible actions need a synchronous human-in-the-loop gate.
✗Storing long-running agent state in the conversation history and expecting it to survive. Wrong — the window is compacted/reset and isolated per subagent; resumable state must be persisted to files (JSON snapshot) with idempotent resume.