Requests
Each batch request mirrors the AI SDK generateText input, plus a customId to correlate its result.
Request shape
Each request mirrors generateText (minus model), plus a customId used to correlate its result. Provide prompt or messages.
await batch({
model: anthropic("claude-opus-4-8"),
requests: [
{
customId: "doc-1",
prompt: "Summarize…",
temperature: 0,
},
{
customId: "doc-2",
system: "You are terse.",
messages: [{ role: "user", content: "Translate…" }],
maxOutputTokens: 256,
},
],
});Supported fields
| Field | Notes |
|---|---|
customId | Correlates the result. Auto-generated as request-<index> if omitted; must be unique within a batch. |
prompt / messages | Provide one. messages is the AI SDK ModelMessage[]. |
system | System prompt. |
tools / toolChoice | Tool definitions, exactly as in generateText. |
maxOutputTokens | Maximum tokens to generate. Required by Anthropic. |
temperature, topP, topK | Sampling controls. |
presencePenalty, frequencyPenalty | Penalties. |
stopSequences, seed | Stop sequences and seed. |
providerOptions | Provider-specific options (e.g. reasoning / thinking config). |
Batch-wide defaults
defaults are merged into every request; request-level values win:
await batch({
model: anthropic("claude-opus-4-8"),
defaults: {
system: "You are terse.",
maxOutputTokens: 256,
},
requests: [
{ customId: "a", prompt: "…" }, // inherits system + maxOutputTokens
{ customId: "b", prompt: "…", maxOutputTokens: 1024 }, // overrides maxOutputTokens
],
});Metadata
metadata attaches free-form key/value pairs to the batch. It's forwarded to
OpenAI, Groq, Together AI, and Mistral, and ignored by Anthropic, Google Gemini,
and xAI (whose batch APIs don't accept batch-level metadata):
await batch({ model, requests, metadata: { description: "nightly eval" } });Limits
Batchwork validates every batch before it reaches a provider. The default guardrails are 50,000 requests per batch, 20 MiB per captured request body, a 200 MiB provider upload payload, and 16 concurrent request captures:
await batch({
model,
requests,
limits: {
captureConcurrency: 8,
maxRequests: 10_000,
maxRequestBytes: 4 * 1024 * 1024,
maxUploadBytes: 100 * 1024 * 1024,
},
});Provider-side caps still apply, so keep custom limits at or below the target provider's maximums.
Per-batch limits vary by provider:
| Provider | Max requests per batch | Max input size |
|---|---|---|
| OpenAI | 50,000 | 200 MB |
| Anthropic | 100,000 | 256 MB |
| Google Gemini | No fixed count | 2 GB file · 20 MB inline |
| Groq | 50,000 | 200 MB |
| Mistral | 100,000 | 512 MB |
| Together AI | 50,000 | 100 MB |
| xAI | 50,000 | 200 MB |
Notes:
- Google Gemini doesn't cap request count directly — batches are bounded by the 2 GB input-file size and a per-model enqueued-token quota. Inline requests must stay under 20 MB total.
- Together AI also limits enqueued tokens to 30B per model at any time.
- xAI's 50,000 cap applies to file-based batches; inline batches are theoretically unbounded but throttled above ~1,000,000 requests.
For very large workloads, split into multiple batches.