智能体 AI 工具
Scenario 8: Agentic AI Tools
The problem
A fintech team wants an internal "ops agent" that engineers invoke in chat to investigate and remediate incidents. A typical request: "Payments are failing in eu-west-1 — find out why and, if it's the rate limiter, raise the limit by 20%." The agent must read logs, query a metrics service, inspect a feature-flag store, and — only with approval — push a config change. The first prototype was unreliable: Claude sometimes guessed instead of querying, sometimes called the wrong tool, and once changed a production flag without asking. The architecture, not the model, was the problem.
This scenario is about turning a pile of capabilities into a trustworthy agentic system — the intersection of how the agent loops (Domain 1) and how each tool is contracted (Domain 2).
The right architecture
Model this as one agentic loop, not a chain of prompts. You give Claude the goal plus a set of tools and let it run the standard cycle: gather context → take action → verify → repeat. Each turn is one Messages API round-trip. Claude emits tool_use blocks and stops with stop_reason: "tool_use"; your application code runs the tool and appends a tool_result block in the next user message; you call the API again until Claude finishes with stop_reason: "end_turn". The single most-tested fact: the model requests tools, it never executes them — execution, the loop, and the termination check are yours.
Tools split into two trust tiers, and the split drives the design:
- Read tools (
get_logs,query_metrics,read_flag) — safe, idempotent, run withtool_choice: "auto"so the model decides when it has enough evidence. - Write tools (
set_flag,scale_service) — irreversible side effects. These get human-in-the-loop (HITL) approval before execution.
The decisive insight: HITL is enforced in your code, not in the prompt. When Claude requests set_flag, the loop pauses before execution and surfaces the proposed call (name + arguments) to the engineer. "Please ask before changing prod" in the system prompt is a probabilistic hope; a code gate is a deterministic guarantee. This is the hooks-vs-prompts distinction from Domain 1: use deterministic control for anything that must always happen.
Tool contracts that the model can use (Domain 2)
The tool description is the selection mechanism — the model picks tools by reading their descriptions, so write them for an LLM, not a human:
Note the explicit "does NOT… use X instead" lines: overlapping or vague descriptions are the #1 cause of wrong tool selection. Enums constrain the model to valid inputs and let it self-correct.
Equally important: return structured errors, not prose. When metrics are unavailable, return a tool_result with is_error: true and a typed reason (transient → retry, validation → fix arguments, permission → escalate). Distinguish an access failure from a valid empty result: "no errors found" and "I couldn't reach the metrics API" must look different, or Claude will conclude the system is healthy when it is actually blind.
Common traps
- Hard-coding the workflow as a fixed chain. Incidents are open-ended; a rigid pipeline can't adapt. Let the model decide the next read — but keep writes gated.
- Trusting the prompt for safety. "Don't touch prod without asking" is not a control. Gate writes in code.
- One mega-tool (
do_ops_thing) with a free-text command. Ambiguous to select and impossible to validate. Prefer narrow, single-purpose tools. - Letting verbose log output flood the context. Trim tool results and delegate noisy discovery to a subagent so the coordinator's context stays clean (Domain 1 + Domain 5 overlap).
- Forcing tools.
tool_choice: "any"would stop the agent from ever giving a final answer; reserve forced choice for when a tool call is genuinely mandatory.
How it maps to the domains
- Domain 1 (Agentic Architecture & Orchestration): the agentic loop,
stop_reasonhandling, model-driven reads vs hard-coded write gates, deterministic hooks vs probabilistic prompts, and HITL placement. - Domain 2 (Tool Design & MCP Integration): descriptions as the selection mechanism, schema/enum design,
tool_choicemodes, and structuredis_errorresults. If these tools live behind MCP servers, the same contracts apply — plus transport choice (stdio vs HTTP) and secrets via env vars.
Exam focus: when a question gives you a goal, a tool set, and a safety constraint, ask two things — who decides the next step? (model for reads, your loop for the termination check) and who guarantees the safety rule? (a code/HITL gate, never the prompt). Pick the answer that lets the model choose reads freely while making irreversible writes deterministic.