章节 — Claude 认证架构师备考

When you have a large volume of requests and don't need answers right now, the Message Batches API processes them asynchronously at 50% of standard token prices. It's the right tool for bulk classification, offline summarization, dataset labeling, and evals — and the wrong tool for anything a user is waiting on.

How it works

You submit a batch of independent Messages API requests, poll for completion, then retrieve results.

json

Each request carries a custom_id you assign. Results come back unordered, so the custom_id is how you map each result to its input row. Make them unique.

Key limits and timing

Up to 100,000 requests or 256 MB per batch.
Most batches finish well within an hour; the maximum window is 24 hours.
Results are retrievable for 29 days after creation.
Each request inside the batch is a full Messages API request — vision, tools, prompt caching, structured output all work.

The lifecycle

Poll GET /v1/messages/batches/{id} until processing_status === "ended", then stream the results. Each result has a result.type:

succeeded — the message is present.
errored — inspect the error; invalid_request means fix and resubmit, server errors are safe to retry.
expired — the 24h window passed; resubmit.
canceled — you canceled the batch.

json

When NOT to use batches

The exam loves this distinction. Batches are asynchronous — never use them for:

Interactive chat or anything latency-sensitive.
A blocking validation or guardrail check inside a request path (e.g. an LLM-judge that gates whether to return a response). Up to 24 hours of latency makes that a non-starter; use a synchronous Messages API call instead.

Use batches precisely when latency is irrelevant and the 50% discount and high throughput are the win.

Exam focus

Know batches are asynchronous, ~50% cheaper, up to 100K requests / 256 MB, with a 24-hour completion window and 29-day result retention. The custom_id maps unordered results back to inputs. Distinguish result types (succeeded / errored / expired / canceled). Most importantly: do not use batches for interactive or blocking-check workloads — those need synchronous Messages API calls.