Claude Platform Foundations

The Message Batches API

10 min de leitura

When you have a large volume of requests and don't need answers right now, the Message Batches API processes them asynchronously at 50% of standard token prices. It's the right tool for bulk classification, offline summarization, dataset labeling, and evals — and the wrong tool for anything a user is waiting on.

How it works

You submit a batch of independent Messages API requests, poll for completion, then retrieve results.

json

Each request carries a custom_id you assign. Results come back unordered, so the custom_id is how you map each result to its input row. Make them unique.

Key limits and timing

  • Up to 100,000 requests or 256 MB per batch.
  • Most batches finish well within an hour; the maximum window is 24 hours.
  • Results are retrievable for 29 days after creation.
  • Each request inside the batch is a full Messages API request — vision, tools, prompt caching, structured output all work.

The lifecycle

Poll GET /v1/messages/batches/{id} until processing_status === "ended", then stream the results. Each result has a result.type:

  • succeeded — the message is present.
  • errored — inspect the error; invalid_request means fix and resubmit, server errors are safe to retry.
  • expired — the 24h window passed; resubmit.
  • canceled — you canceled the batch.
json

When NOT to use batches

The exam loves this distinction. Batches are asynchronous — never use them for:

  • Interactive chat or anything latency-sensitive.
  • A blocking validation or guardrail check inside a request path (e.g. an LLM-judge that gates whether to return a response). Up to 24 hours of latency makes that a non-starter; use a synchronous Messages API call instead.

Use batches precisely when latency is irrelevant and the 50% discount and high throughput are the win.

Exam focus

Know batches are asynchronous, ~50% cheaper, up to 100K requests / 256 MB, with a 24-hour completion window and 29-day result retention. The custom_id maps unordered results back to inputs. Distinguish result types (succeeded / errored / expired / canceled). Most importantly: do not use batches for interactive or blocking-check workloads — those need synchronous Messages API calls.