Arclight Batch — Job Reference

Reference for Arclight Batch, a managed parallel-job runner used for ETL, model evaluation, and one-off bulk transforms.

Job model

An Arclight Batch job is a directed acyclic graph of tasks; each task runs as a single OCI container instance. Jobs are submitted via the `arclight batch submit` CLI or through the JSON-over-HTTPS REST API at `/v2/jobs`. Each job is assigned a UUIDv7 job ID at submission. The maximum number of tasks in a single job DAG is 5,000; jobs above this are rejected at submission with HTTP 422.

Task scheduling

Tasks are scheduled onto a pool of pre-provisioned worker nodes selected by the job's `compute_class` (one of: `cpu_small`, `cpu_large`, `gpu_t4`, `gpu_l4`). Within a job, task fan-out is limited to 256 concurrently running tasks per compute class; queued tasks wait until a slot frees. There is no per-account global concurrency cap on production-tier accounts; sandbox accounts are capped at 8 concurrent tasks total.

Retries and timeouts

Each task may be configured with a per-task timeout (default 30 minutes, maximum 12 hours) and a retry policy (default: 2 retries on exit code != 0, exponential backoff starting at 30 seconds). Retries are not attempted for tasks that exit with code 75 (`EX_TEMPFAIL`) more than three times consecutively or for tasks that exceed twice the configured timeout — both result in a permanent task failure that aborts dependent tasks.

Logging and artifacts

Stdout and stderr of every task are streamed to the Arclight log store and retained for 30 days by default. Larger artifacts written to `/arclight/output` inside the container are uploaded after task completion to a job-scoped object-storage prefix and are retained for 7 days unless an explicit retention policy is set. Artifacts larger than 5 GB per file are not supported and will fail the upload step with a non-retryable error.

Key facts

  • An Arclight Batch job is a DAG of tasks where each task is one OCI container instance.
  • Jobs are submitted via `arclight batch submit` CLI or POST /v2/jobs.
  • Maximum tasks per job DAG is 5,000.
  • Task fan-out is capped at 256 concurrent tasks per compute class within a job.
  • Default per-task timeout is 30 minutes with a 12-hour maximum.
  • Default retry policy is 2 retries on non-zero exit with exponential backoff starting at 30 seconds.
  • Sandbox accounts are capped at 8 concurrent tasks total.
  • A task that exits with code 75 (EX_TEMPFAIL) three times in a row is permanently failed.
  • Artifacts larger than 5 GB per file are unsupported and fail the upload step non-retryably.

Details

product
Arclight Batch
doc_type
reference
version
2.4

More