Tool Design & MCP Integration

Structured Tool Errors

예상 소요 시간 11분

Structured Tool Errors

A tool that throws and crashes the loop is a design bug. A well-designed tool returns its failures to Claude as a tool_result so the model can recover — retry, fall back, ask the user, or escalate. The mechanism is the is_error flag on the tool result block.

How an error is returned

When a tool fails, send back a tool_result referencing the tool_use_id, with is_error: true and a content string describing the problem:

json

The tool_use_id must match the id from the assistant's tool_use block. The content should be actionable: tell Claude why it failed and what to do next. A bare "Error" wastes the recovery opportunity.

Classify errors so the model recovers correctly

Not all failures want the same reaction. Encode the class of error in the message so Claude (or your wrapper logic) chooses the right response:

  • Transient — network timeout, rate limit (429), upstream 503. The right reaction is retry, ideally with backoff. Signal it: "TransientError: upstream timed out. Safe to retry."
  • Validation — bad or missing arguments, schema mismatch. The right reaction is fix the input and retry once. Signal exactly which field is wrong and the expected format.
  • Permission — auth failure, 403, scope missing. Retrying is pointless; the right reaction is escalate to a human or stop. Signal: "PermissionError: token lacks repo:write. Do not retry; ask the user to re-authorize."

This taxonomy maps directly onto reliability behavior: transient → retry, validation → repair-and-retry, permission → halt/escalate.

Distinguish failure from a valid empty result

A subtle but heavily tested point: "access failed" is not the same as "found nothing." If a search runs successfully and returns zero rows, that is a valid result (is_error is false, content "No matching orders found."). If the database was unreachable, that is an error (is_error: true). Collapsing the two makes the model hallucinate — it may treat an outage as "confirmed: no data."

Keep error payloads high-signal

Return only what Claude needs to act. A 500-line stack trace burns context and buries the actionable line. Trim to: error class, the one-sentence cause, and the recommended next step. This is the same "high-signal responses" principle that governs successful tool output.

Exam focus

Know that errors flow back through tool_result with is_error: true (not by throwing) and must reference the tool_use_id. Memorize the three classes and their reactions: transient → retry, validation → repair input, permission → escalate (don't retry). And remember the trap: a successful search returning empty is a valid result, not an error — never report an access failure as "no data."