Message Batches API
When you have a large volume of requests and don't need answers right now, the Message Batches API processes them asynchronously at 50% of standard token prices. It's the right tool for bulk classification, offline summarization, dataset labeling, and evals — and the wrong tool for anything a user is waiting on.
How it works
You submit a batch of independent Messages API requests, poll for completion, then retrieve results.
Each request carries a custom_id you assign. Results come back unordered, so the custom_id is how you map each result to its input row. Make them unique.
Key limits and timing
- Up to 100,000 requests or 256 MB per batch.
- Most batches finish well within an hour; the maximum window is 24 hours.
- Results are retrievable for 29 days after creation.
- Each request inside the batch is a full Messages API request — vision, tools, prompt caching, structured output all work.
The lifecycle
Poll GET /v1/messages/batches/{id} until processing_status === "ended", then stream the results. Each result has a result.type:
succeeded— themessageis present.errored— inspect the error;invalid_requestmeans fix and resubmit, server errors are safe to retry.expired— the 24h window passed; resubmit.canceled— you canceled the batch.
When NOT to use batches
The exam loves this distinction. Batches are asynchronous — never use them for:
- Interactive chat or anything latency-sensitive.
- A blocking validation or guardrail check inside a request path (e.g. an LLM-judge that gates whether to return a response). Up to 24 hours of latency makes that a non-starter; use a synchronous Messages API call instead.
Use batches precisely when latency is irrelevant and the 50% discount and high throughput are the win.
Exam focus
Know batches are asynchronous, ~50% cheaper, up to 100K requests / 256 MB, with a 24-hour completion window and 29-day result retention. The custom_id maps unordered results back to inputs. Distinguish result types (succeeded / errored / expired / canceled). Most importantly: do not use batches for interactive or blocking-check workloads — those need synchronous Messages API calls.