Back to study guides
Domain 4 · 20% of the exam

Domain 4 Study Guide: Prompt Engineering & Structured Output

Domain 4: Prompt Engineering & Structured Output

Domain 4 is 20% of the CCA-Foundations exam. It tests whether you can write prompts Claude follows reliably and shape its output so downstream code can consume it without brittle string parsing. The exam frames this as an architect's responsibility: a prompt is an interface contract, and structured output is the schema for that contract.

Blueprint overview

You should be able to:

  • Apply the core prompting techniques — explicit success criteria, few-shot examples, chain-of-thought (CoT), and system/role prompts — and pick the right one for a task.
  • Guarantee machine-parseable output using tool_use + JSON Schema with forced tool_choice, and explain why this beats "return JSON only" prose prompting.
  • Design good schemas (clear field names, enums, required, nesting) and understand what a schema can and cannot guarantee.
  • Build validation-retry loops that catch bad values (not just bad shape) and feed errors back to the model.
  • Scale review and grading with multi-pass / multi-instance patterns and the Message Batches API.

Core prompting techniques

Be explicit about success criteria. Claude follows instructions literally. Vague asks ("summarize this well") produce vague output. State the format, length, audience, what to include, and what to exclude. Negative constraints ("do not include pricing") are honored when stated directly. Putting the most important instruction last, near the end of the prompt, increases adherence.

Few-shot (multishot) prompting. Showing 2-5 input → output examples is one of the most reliable ways to control format and edge-case handling. Examples teach by demonstration what instructions struggle to describe. Keep examples consistent in structure and cover the tricky cases you care about. Wrapping examples in XML-style tags (e.g. <example>...</example>) helps Claude separate examples from the live input.

Chain-of-thought (CoT). For multi-step reasoning, math, or analysis, ask the model to think step by step before answering. Reasoning improves accuracy, but the reasoning text is not structured. The standard pattern is: let the model reason in a <thinking> section, then emit the final answer separately (or via a forced output tool). CoT trades latency/tokens for accuracy — use it when correctness matters more than speed.

System prompts and roles. The system parameter sets persona, rules, and tone that persist across the conversation. Put durable instructions (role, policies, output rules) in system; put the task-specific input in the user turn. This separation also makes prompts cacheable.

Structured output: tool_use + JSON Schema

When code must parse the output, do not ask for "JSON only" in prose — the model may add a preamble, code fences, or trailing commentary. Instead define a tool whose input_schema is a JSON Schema, then force it:

json

Key facts the exam tests:

  • tool_choice modes: auto (model decides — the default when tools are present), any (must call some tool), and { "type": "tool", "name": "..." } (force one specific tool). The forced form is the structured-output pattern.
  • When a tool is used, stop_reason is tool_use, and you read the payload from the tool_use content block's input field — not from a text block.
  • If you only use the tool to shape output (no real side effect), you do not need a tool_result round-trip; read input and stop.

Schema design

A good schema is part of the prompt. Use descriptive field names, constrain choices with enum, mark non-optional fields required, and add a one-line description to the tool and to ambiguous fields. Prefer explicit units (total_cents over total) to remove ambiguity. Remember the boundary: a schema guarantees shape (types, presence, enum membership), not semantics. A vendor field is guaranteed to be a string — it is not guaranteed to be the correct vendor.

Validation-retry loops

Because schemas cannot guarantee correct values, validate after parsing and retry on failure. The robust loop is:

  1. Call the model with the forced output tool.
  2. Parse tool_use.input and run your own validation (business rules, cross-field checks, value ranges).
  3. On failure, send the specific error message back to the model and ask it to fix only what is wrong.
  4. Cap retries (e.g. 2-3) and have a fallback (escalate, default, or flag for human review).

Feeding the concrete validation error back ("total_cents was negative; it must be ≥ 0") is far more effective than blindly re-running the same prompt.

Review and batch at scale

For high-stakes output, use multi-pass review: a second, independent Claude call grades or critiques the first call's output against a rubric. Keeping the reviewer instance separate (fresh context, its own rubric prompt) avoids the model rubber-stamping its own work, and is the LLM-as-judge pattern.

When you have many independent prompts (grading a dataset, classifying a backlog, bulk extraction) and do not need answers in real time, use the Message Batches API. You submit a batch of requests; it processes them asynchronously (results typically within 24 hours) at roughly 50% of standard token cost. Use batches for throughput-oriented, latency-tolerant offline jobs — not for interactive requests.

How it fits the architect role

The exam rewards answers that treat the prompt + schema as a contract: explicit criteria up front, structured output for anything code consumes, validation that checks values (not just shape), and the right scaling primitive (sync call vs. batch) for the workload.

Exam tips

  • For reliable, code-parseable output the correct answer is almost always "define a tool with a JSON Schema input_schema and force it with tool_choice: {type:"tool", name:...}" — not "ask the model to return JSON only."
  • Know the three tool_choice modes cold: auto (model decides, default when tools present), any (must call some tool), and {type:"tool", name} (force one specific tool). The forced mode is the structured-output pattern.
  • When a tool is invoked, stop_reason is "tool_use" and the structured data lives in the tool_use block’s input field — not in a text block. Questions that say "parse the text response as JSON" are testing whether you read from the wrong place.
  • A JSON Schema guarantees SHAPE (types, required fields, enum membership), not SEMANTICS. If a question asks how to ensure values are correct, the answer is a validation-retry loop, not a stricter schema.
  • Few-shot examples are the most reliable lever for controlling output format and edge cases. If a prompt produces inconsistent formatting, "add 2-5 examples" usually beats "add more instructions."
  • Chain-of-thought improves accuracy on multi-step reasoning but produces unstructured text and costs latency/tokens. The clean pattern is: reason in a <thinking> section, then emit the final answer via a forced output tool.
  • Use the Message Batches API for many independent, latency-tolerant requests (grading, bulk classification/extraction): asynchronous, ~24h turnaround, ~50% cheaper. It is the wrong choice for interactive/real-time requests.
  • In validation-retry loops, feed the SPECIFIC error back to the model and ask it to fix only what failed, and always cap retries with a fallback (escalate/default/human review) — an unbounded retry loop is a wrong answer.

Anti-patterns

  • Asking for "JSON only, no other text" in the prose prompt and parsing the text block. Wrong: the model can still add fences, a preamble, or trailing commentary; tool_use + forced tool_choice is the API-enforced way to get clean structure.
  • Believing a strict JSON Schema guarantees the values are correct. Wrong: a schema only enforces types/presence/enums; a string field can still hold the wrong string. Correctness needs validation, not a tighter schema.
  • Reading structured output from a text content block instead of the tool_use block’s input field. Wrong: when stop_reason is tool_use, the parsed data is in the tool_use input, not in text.
  • Re-running the exact same prompt on validation failure without telling the model what went wrong. Wrong: blind retries waste tokens; passing the concrete error back lets the model correct the specific field.
  • Using the Message Batches API for interactive, user-facing requests to "save money." Wrong: batches are asynchronous with up-to-24h turnaround — fine for offline jobs, unacceptable for real-time latency.
  • Having the same Claude call generate output AND grade its own output in one pass. Wrong: self-grading in-context rubber-stamps errors; high-stakes review needs an independent reviewer instance (LLM-as-judge).
  • Piling on more natural-language instructions to fix inconsistent formatting instead of adding few-shot examples. Wrong: for format/edge-case control, demonstrating with examples is more reliable than describing.
  • Setting tool_choice to "auto" when you require structured output, then being surprised the model replied with prose. Wrong: auto lets the model skip the tool; force the specific tool when output must be structured.