Skip to content

Providers

AgentLoom ships with four providers. The gateway routes requests based on model name and falls back automatically when a provider is unavailable.

Capability matrix

Capability OpenAI Anthropic Google Ollama
Models gpt-*, o3*, o4* claude* gemini* Any local model
Streaming SSE SSE SSE NDJSON
Image input
PDF input
Audio input
Reasoning token count (o-series, implicit) (rolled into output_tokens) (Gemini 2.5+, opt-in) (no eval_count split)
Reasoning content (trace) (server-side only) (type="thinking" blocks) (includeThoughts opt-in) (Ollama 0.9+ message.thinking)
Cost tracking Free (local)

Configuration

Switch provider in any workflow:

config:
  provider: google
  model: gemini-2.5-flash

Or override at runtime via CLI:

agentloom run workflow.yaml --provider anthropic --model claude-sonnet-4-20250514

Environment variables

Variable Provider
OPENAI_API_KEY OpenAI
ANTHROPIC_API_KEY Anthropic
GOOGLE_API_KEY Google
OLLAMA_BASE_URL Ollama (default: http://localhost:11434)
AGENTLOOM_OLLAMA_FALLBACK Opt-in: any truthy value (1, true, yes) registers Ollama as a global fallback during provider auto-discovery.

Breaking change in 0.5.0 — Ollama is now opt-in

Pre-0.5.0 AgentLoom auto-registered Ollama as a catch-all fallback whenever no providers: block was explicitly configured. Every primary-provider failure — a 404 from a wrong model id, a transient 5xx — triggered a secondary call to http://localhost:11434, adding 250 ms – 1 s of latency per failure and a confusing error chain ("Provider 'anthropic' failed: 404 / Provider 'ollama' failed: model not found") for users who didn't run Ollama at all.

Starting in 0.5.0 you opt in via AGENTLOOM_OLLAMA_FALLBACK=1, or by listing ollama in the top-level providers: block of an agentloom.yaml config file. Workflows that set provider: ollama directly (no fallback) keep working unchanged — the opt-in only affects the implicit fallback path.

Opt-in via the config file

The providers: block is a top-level key of the agentloom.yaml config file (the one passed to load_config / the CLI's --config), not a field of a workflow's config:. Declaring it disables auto-discovery, so list every provider you want — API keys are still read from the environment when omitted here:

# agentloom.yaml
providers:
  - name: openai
    models: ["gpt-4o-mini"]
  - name: ollama
    base_url: http://localhost:11434
    is_fallback: true

Pin to date-stamped IDs rather than tracking aliases. Date-pinned IDs reproduce the exact behaviour of the article validations; tracking aliases (-latest, -flash-latest) may stop accepting console keys without notice.

Provider Recommended ID Notes
Anthropic claude-haiku-4-5-20251001 Pre-0.5.0 docs referenced claude-3-5-haiku-latest, which current console keys reject with 404 not_found_error.
Google gemini-2.5-flash Pre-0.5.0 docs referenced gemini-2.0-flash, which Google's API now returns as 404 NOT_FOUND for new users.
OpenAI gpt-4o-mini / gpt-4o No change.

Circuit breaker

The gateway wraps each provider with a circuit breaker:

State Behavior Transition
Closed Requests pass through normally Open after 5 consecutive failures
Open Requests rejected immediately, fallback provider used Half-open after 60s
Half-open One test request allowed Closed on success, Open on failure

RateLimitError (HTTP 429) and stream cancellations (GeneratorExit / anyio.CancelledError) are excluded from the failure count — being throttled or aborted is not a provider outage. Only genuine errors count toward the 5-failure threshold.

Rate limiter

Dual token-bucket rate limiting per provider:

  • Requests per minute — default 60 RPM
  • Tokens per minute — default 100,000 TPM
gateway.register(
    provider,
    max_rpm=120,          # requests/minute
    max_tpm=200_000,      # tokens/minute
)

max_rpm and max_tpm must be >= 1; the limiter rejects zero/negative bounds at registration with ValueError. A request whose estimated token_count exceeds max_tpm also raises ValueError instead of blocking forever on a bucket that can never refill that high — this is a local precondition violation, not a RateLimitError (which is reserved for HTTP 429 responses from the provider).

HTTP errors

All provider adapters normalize remote errors to a common taxonomy:

HTTP status Exception Notes
429 Too Many Requests RateLimitError Numeric Retry-After (seconds) is parsed and exposed on the exception. HTTP-date form is not supported — providers we talk to use integer seconds.
5xx ProviderError Counts toward the circuit breaker
network / timeout ProviderError Counts toward the circuit breaker

Provider adapters declare an explicit kwargs allowlist for extra parameters; unknown kwargs raise a TypeError at call time rather than silently reaching the vendor's API. Each adapter exposes its allowlist via a constant (_OPENAI_EXTRA_PAYLOAD_KEYS, _ANTHROPIC_EXTRA_PAYLOAD_KEYS, _GOOGLE_GEN_CONFIG_KEYS + _GOOGLE_TOPLEVEL_KEYS, _OLLAMA_OPTION_KEYS + _OLLAMA_TOPLEVEL_KEYS).

OpenAI base_url normalization

The OpenAI adapter normalizes base_url so workflows that point at the bare host (https://api.openai.com) get the /v1 suffix automatically. Custom enterprise gateways that already include a path are preserved verbatim:

Input Normalized
https://api.openai.com https://api.openai.com/v1
https://api.openai.com/ https://api.openai.com/v1
https://api.openai.com/v1 https://api.openai.com/v1
https://gw.example.com/v2 https://gw.example.com/v2 (preserved)
https://gw.example.com/api/v1/foo https://gw.example.com/api/v1/foo (preserved)

Pre-0.5.0 the rule was "append /v1 unless the URL ends literally in /v1", which silently mangled /v2 into /v2/v1 and any deeper path into broken request URLs.

Non-retryable errors

The resilience layer short-circuits the retry loop when an exception carries is_retryable = False. Pre-0.5.0 these errors burned the full retry budget (up to 127 s of backoff per workflow) before surfacing the same failure the first attempt returned.

Exception Reason
SandboxViolationError Sandbox policy is deterministic; the next attempt is refused identically.
ToolNotFoundError (subclass of KeyError) A typo in tool_name never resolves itself.
AttachmentResolutionError (subclass of ValueError) Deterministic resolution failure — size limit, empty source, unsupported type, or a missing local file.
TemplateError A typo in {state.foo} never resolves itself.
ValidationError Workflow / step definition refused by Pydantic — fix the YAML, don't retry.
SecurityError Expression rejected by router AST policy — semantics, not flakiness.
BudgetExceededError Spend doesn't decrease between attempts.
Pydantic ValidationError Provider-side schema rejection. Special-cased because it's outside the AgentLoom hierarchy.

Every failed StepResult carries an error_classification field — "permanent" for the errors above and "transient" for failures that exhausted the retry budget. It is part of the result model (visible in agentloom run --json and on result.step_results), so callers and post-run analysis can distinguish "we wasted 30 s retrying nothing" from "we actually retried a transient one".

Fallback chain

Providers are tried in priority order. Register multiple providers for automatic fallback:

gateway.register(openai_provider, priority=0)
gateway.register(anthropic_provider, priority=1, is_fallback=True)
gateway.register(ollama_provider, priority=2, is_fallback=True)

If OpenAI fails or its circuit breaker trips, the gateway automatically routes to Anthropic. If Anthropic also fails, it falls back to Ollama.

Multi-modal attachments

LLM steps support image, PDF, and audio attachments:

steps:
  - id: analyze
    type: llm_call
    prompt: "Describe what you see in this image."
    attachments:
      - type: image
        source: "{state.image_url}"
        fetch: local
    output: description
Field Description
type image, pdf, or audio
source HTTP(S) URL, data: URL, local file path, or raw base64 data
media_type Optional; inferred from type if omitted
fetch local (engine downloads) or provider (provider fetches URL directly)

Provider support varies

Check the capability matrix above. Sending a PDF to OpenAI or audio to Anthropic will raise a ProviderError.

data: URL attachments

A source may be an RFC 2397 data: URL — data:image/png;base64,iVBORw0KGgo… — to inline binary content directly in the workflow, a common idiom when an earlier step composed the image in-process. Both base64 and percent-encoded payloads are decoded. A malformed URL or invalid base64 raises AttachmentResolutionError (a non-retryable failure) rather than a misleading FileNotFoundError.

data: URLs make no network call and open no file, so they are allowed unconditionally even when sandbox.enabled: true — the threat model for inline content is the workflow author, not a remote host. The 20 MB size limit still applies to the decoded payload.

Reasoning models

OpenAI o-series (o1, o3, o4-mini) and Anthropic Claude with extended thinking produce internal reasoning tokens before the final answer. Providers bill these at the output rate, so cost accounting must include them.

TokenUsage exposes the count alongside the usual fields:

usage.prompt_tokens          # input
usage.completion_tokens      # visible output
usage.reasoning_tokens       # provider-side chain-of-thought
usage.billable_completion_tokens  # completion + reasoning

calculate_cost() charges (prompt × input_rate) + ((completion + reasoning) × output_rate) automatically, so workflow budgets and Prometheus cost metrics reflect the true spend.

OpenAI — reasoning is implicit when an o-series model is selected. The adapter parses completion_tokens_details.reasoning_tokens from the response. The chain-of-thought trace is kept server-side and is never returned, so ProviderResponse.reasoning_content stays None.

Anthropic — extended thinking is opt-in via the step-level thinking block (see workflow YAML). ThinkingConfig translates to the thinking: {type: "enabled", budget_tokens} request payload, and type="thinking" content blocks are concatenated into ProviderResponse.reasoning_content. The Anthropic API does not surface a separate thinking-token count — extended-thinking volume is rolled into usage.output_tokens per the Anthropic docs — so reasoning_tokens stays 0 for this provider. Cost is automatically correct because the output rate is applied to output_tokens which already includes the thinking volume.

Google Gemini 2.5+ — opt-in via the same thinking block. ThinkingConfig translates to generationConfig.thinkingConfig with thinkingBudget (from budget_tokens), thinkingLevel (from level), and includeThoughts (from capture_reasoning). The adapter parses usageMetadata.thoughtsTokenCount (defaulting to 0 when the field is absent — Gemini omits it for non-thinking models and intermittently on gemini-3-flash-preview). When includeThoughts=true, parts marked thought=true are split into reasoning_content so the visible content stays clean.

Ollama 0.9+ — opt-in via thinking. ThinkingConfig translates to the top-level think request parameter (<level> when level is set, else true). The adapter surfaces message.thinking on reasoning_content. As a fallback for older models or calls without think=true, the adapter strips inline <think>...</think> tags from content and surfaces the captured trace the same way.

Ollama caveat — no token split

Ollama exposes a single eval_count for all output tokens regardless of whether thinking is active, so reasoning_tokens always reports 0 for this provider. Cost is unaffected (local models are free), but billable_completion_tokens will not reflect the true thinking volume.

Security

SSRF protection

URL-based attachments (fetch: local) are protected against Server-Side Request Forgery. The engine blocks requests to private and reserved IP ranges (RFC 1918, loopback, link-local) before any network call is made.

Webhook destination gate

Approval-gate webhooks (approval_gate.notify.url) pass through two independent gates:

  1. Scheme gate — always on. Non-http/https schemes (file://, data:, javascript:, gopher://, ...) are refused regardless of opt-in flags.
  2. Internal-host gate — always on unless the workflow explicitly opts out. Blocks loopback (127.0.0.0/8, ::1, localhost, *.localhost), link-local (169.254.0.0/16 — covers AWS / GCP / Azure metadata service at 169.254.169.254; fe80::/10), RFC 1918 (10/8, 172.16/12, 192.168/16), CGNAT (100.64/10), ULA (fc00::/7), the unspecified addresses (0.0.0.0, ::), multicast, and reserved ranges. IPv4-mapped IPv6 forms (::ffff:127.0.0.1) are normalised first so they hit the same flags.

Hostname classification uses getaddrinfo so both A and AAAA records are inspected — an attacker can't smuggle a loopback target through an AAAA-only DNS response. Percent-encoded and IDN hostnames are decoded before the literal-string check.

When the sandbox is enabled, the URL must additionally satisfy allow_network, allowed_schemes, and allowed_domains. Workflows that genuinely need to notify an in-cluster service can waive only the internal-host gate via:

config:
  sandbox:
    allow_internal_webhook_targets: true

The opt-in does NOT widen the scheme gate — file:// and friends stay refused. A blocked webhook is logged and emitted as a status="sandbox_blocked" observer breadcrumb; the approval gate itself still pauses normally because pause and notify are independent.

Router expression boundary

Router conditions are AST-validated against an allowlist (==, and/or, safe builtins like len). Dunder and underscored attributes are rejected on both state.foo and state['foo'] so a workflow author who seeds state with _secret cannot accidentally surface it through a router predicate. Subscript slices must be literal int/str — variables (state[lookup]), arithmetic (state['_' + 'secret']), conditionals, and calls are refused because the validator cannot determine the resulting key at parse time. Numeric slicing with optional unary ± bounds works (state['items'][::-1], state['items'][-2:]). A single method call on a state value (state.severity.strip(), state.label.lower()) is permitted; chained calls (state.severity.strip().lower()) are refused today because the validator requires call receivers to be a name / attribute / subscript chain, not another call. eval, __import__, comprehensions, lambdas, and starred unpacking remain blocked.

State is reachable from a router predicate only through the state.X (or state['X']) surface — top-level state keys are not exposed as bare names, so a state key called len cannot shadow the safe builtin.

Allowed paths

sandbox.allowed_paths grants both read and write access to a directory tree; readable_paths and writable_paths narrow it down per direction. Resolved paths must live inside an allowed prefix, and the resolution itself is wrapped — null bytes, oversized components, symlink loops, OS-level rejections, and non-string callers all surface as SandboxViolationError (not the raw ValueError / OSError / RuntimeError / TypeError).

Command argument validation also covers flag-embedded paths: tee --output=/etc/passwd, dd of=/dev/sda, and similar --key=value / key=value forms have their value side validated against the allowlist, not just bare positional tokens.

Avoid mounting /dev

allowed_paths: ["/dev"] grants access to every device node — /dev/null, /dev/console, /dev/mem on Linux — and a tool that opens a file descriptor against an unexpected device can hang the workflow or leak data. Pick the tightest sub-directory you actually need (/dev/null if you only want to discard output) instead of the whole tree.

State redaction

Sensitive state values (API keys, passwords, tokens) can be flagged so they never land in a persisted artefact:

state:
  api_key: "..."
  password: "..."
  user_id: 42
state_schema:
  api_key: { redact: true }
  password: { redact: true }
  "*token*": { redact: true }

Or, for a deployment-wide baseline, set AGENTLOOM_REDACT_STATE_KEYS=api_key,password,*token* — the env-var policy is merged with the YAML one.

Redaction is applied at every persistence boundary:

  • Checkpoint files: the runtime state snapshot, the literal state: block in workflow_definition, every step_results[id].output value (an LLM call that returns a structured payload), and any step-level config field whose key matches the policy (notify.headers.api_key, tool_args.api_key, ...).
  • WorkflowResult.final_state: what agentloom run --json echoes to stdout and what result.model_dump_json() returns to Python callers.
  • Webhook body_template rendering.
  • Opt-in capture_prompts span event: re-rendered against the redacted state so the trace backend sees the sentinel.
  • Subworkflows: the parent's redaction patterns are merged into the child's state_schema at dispatch, so a parent's redact: true survives the parent/child boundary. The parent's sandbox config is also forwarded (a child cannot loosen what the parent locked).

The in-memory state stays plaintext so a step that legitimately interpolates {state.api_key} against the provider keeps working — only the persistence layer sees the stable <REDACTED:sha256=...> sentinel. Redaction is idempotent: a second pass over an already-redacted value preserves it byte-for-byte (so diffing across resume cycles is stable).

WorkflowDefinition uses extra="forbid" so a typo in state_schema: (e.g. stat_schema:) fails at parse time instead of silently shipping the secret to disk.

Resume contract

A redacted checkpoint cannot be resumed with the original secret value. If a workflow pauses on approval_gate before consuming the secret, plan to re-inject it on resume (CLI --state api_key=...) or do not flag the key as redact: true. WorkflowEngine.from_checkpoint logs a warning that lists every state key whose loaded value is a sentinel.

Attachment size limit

All attachments are limited to 20 MB per file. Larger files are rejected before being sent to the provider.