Workflow YAML Reference¶
Complete reference for workflow definition files.
Top-level structure¶
name: my-workflow # required
version: "1.0" # optional, default "1.0"
description: "What this workflow does" # optional
config:
provider: openai # default provider
model: gpt-4o-mini # default model
max_retries: 3 # retry attempts per step
budget_usd: 0.50 # spending limit (null = unlimited)
timeout: 300.0 # workflow timeout in seconds (null = unlimited)
max_concurrent_steps: 10 # parallel step limit
stream: false # streaming default
state:
key: "value" # initial state variables
nested:
key: "value"
steps: # at least one step required
- id: step_id
type: llm_call # llm_call | tool | router | subworkflow
# ... step-specific fields
Config options¶
| Option | Type | Default | Description |
|---|---|---|---|
provider |
string |
openai |
Default LLM provider |
model |
string |
gpt-4o-mini |
Default model for all LLM steps |
max_retries |
int |
3 |
Retry attempts on failure |
budget_usd |
float |
null |
Maximum spend in USD |
timeout |
float |
null |
Workflow timeout in seconds |
max_concurrent_steps |
int |
10 |
Max parallel steps per layer. Bounded 1 ≤ N ≤ 1024 at parse time — values outside that range raise a Pydantic error instead of deadlocking the limiter or surfacing a cryptic total_tokens must be >= 0. |
stream |
bool |
false |
Enable streaming by default |
sandbox |
object |
disabled | Security sandbox config |
on_step_failure |
string |
skip_downstream |
Behaviour when a step (or router) ends in FAILED. skip_downstream (default) marks every transitive dependent as SKIPPED with an error field naming the closest failed ancestor; continue keeps the pre-0.5.0 best-effort behaviour where dependents still run against partial state. |
strict_outputs |
bool |
false |
Promote the parallel-output collision warning to a parse error. Two parallel-eligible steps writing the same output: key normally trigger a UserWarning listing both step ids; set strict_outputs: true to refuse the workflow at parse time. Sequential overwrite via depends_on is exempt — it's an intentional pattern. |
responses_file |
string |
null |
Mock provider recording path (when provider: mock) |
latency_model |
string |
constant |
Mock latency mode: constant / normal / replay |
latency_ms |
float |
0 |
Mock provider simulated latency per call |
capture_prompts |
bool |
false |
When true, llm_call spans emit an agentloom.prompt.captured event with the rendered prompt + system prompt. Off by default — opt-in for debugging or trusted environments only |
Checkpointing¶
Persist workflow execution state so failed or paused runs can be resumed without re-executing completed steps.
CLI usage¶
# Run with checkpointing enabled
agentloom run workflow.yaml --checkpoint
# Custom checkpoint directory
agentloom run workflow.yaml --checkpoint --checkpoint-dir /data/checkpoints
# List all checkpointed runs
agentloom runs
agentloom runs --json
# Resume a previous run
agentloom resume <run_id>
agentloom resume <run_id> --lite --json
How it works¶
When --checkpoint is enabled, the engine:
- Generates a unique run ID (printed at startup).
- Saves a checkpoint file after the workflow completes (success or failure).
- The checkpoint contains the full workflow definition, state, and step results.
On agentloom resume <run_id>:
- Loads the checkpoint from disk.
- Reconstructs the workflow engine with the saved state.
- Skips already-completed steps and continues from where it left off.
Checkpoint files are stored as JSON in .agentloom/checkpoints/ by default
(configurable via --checkpoint-dir).
State¶
State variables are initialized in the state block and accessible in templates:
Template syntax:
| Expression | Result |
|---|---|
{state.question} |
"What is Python?" |
{question} |
"What is Python?" (flat access) |
{state.items[0].name} |
"Item A" |
{state.count} |
42 |
Steps with output: key update state[key] after execution.
Templates render tool step args through the same renderer as prompt fields, so {state.url} substitutions work uniformly across step types. By default a missing key ({state.does_not_exist}) is logged and rendered as an empty string; to raise TemplateError on missing keys instead, build the namespace with build_template_vars(state, strict=True) (so nested {state.*} lookups also raise) and render with SafeFormatDict(template_vars, strict=True).
state_schema — per-key redaction¶
Sensitive state values can be flagged so they never land in a checkpoint, webhook body, or trace span. The plaintext stays in memory so the active workflow can still use it.
state:
api_key: "sk-..."
password: "hunter2"
user_id: 42
state_schema:
api_key: { redact: true }
password: { redact: true }
"*token*": { redact: true }
Glob patterns match against the key name; for nested dicts they match against the dotted path (credentials.access_token). The same policy can be applied deployment-wide via AGENTLOOM_REDACT_STATE_KEYS=api_key,password,*token* (env-var and YAML policies are merged). See Security → State redaction for the full surface and the resume contract.
Step types¶
llm_call¶
Sends a prompt to an LLM and stores the response.
- id: answer
type: llm_call
prompt: "Answer: {state.question}" # required
system_prompt: "You are helpful." # optional
model: gpt-4o # optional, overrides config
temperature: 0.7 # optional (0-2)
max_tokens: 1000 # optional
stream: true # optional, overrides config
output: answer # state key for result
timeout: 30.0 # per-step timeout
depends_on: [previous_step] # dependencies
attachments: # multi-modal input
- type: image
source: "{state.image_url}"
fetch: local
retry:
max_retries: 3
backoff_base: 2.0
backoff_max: 60.0
LLM step fields:
| Field | Type | Default | Description |
|---|---|---|---|
prompt |
string |
— | Required. Template string with {state.*} interpolation |
system_prompt |
string |
null |
Optional system message |
model |
string |
null |
Override workflow-level model |
temperature |
float |
null |
Sampling temperature (0-2), provider default if null |
max_tokens |
int |
null |
Output token limit |
stream |
bool |
null |
Override workflow-level streaming setting |
attachments |
list[Attachment] |
[] |
Multi-modal inputs (see Providers) |
thinking |
ThinkingConfig |
null |
Extended-thinking / reasoning config (see Reasoning models) |
output |
string |
null |
State key to store result |
timeout |
float |
null |
Per-step timeout in seconds |
depends_on |
list[string] |
[] |
Step IDs that must complete first |
Thinking config:
| Field | Type | Default | Description |
|---|---|---|---|
enabled |
bool |
false |
Activate provider-side reasoning |
budget_tokens |
int |
null |
Anthropic budget_tokens / Gemini thinkingBudget cap. OpenAI infers from model tier and ignores this field |
level |
"low" \| "medium" \| "high" |
null |
Gemini thinkingLevel / Ollama think value |
capture_reasoning |
bool |
true |
Expose the chain-of-thought trace via ProviderResponse.reasoning_content (Anthropic / Gemini / Ollama). OpenAI o-series keeps the trace server-side regardless |
Per-provider translation:
| Provider | Translation |
|---|---|
| OpenAI o-series | Reasoning is implicit in the model name; the config is accepted for YAML uniformity but not forwarded to the wire |
| Anthropic | thinking: {type: "enabled", budget_tokens: <budget_tokens>} |
| Google Gemini 2.5+ | generationConfig.thinkingConfig: {thinkingBudget, thinkingLevel, includeThoughts} |
| Ollama 0.9+ | top-level think: <level> if level is set, else think: true |
- id: complex_reasoning
type: llm_call
model: claude-opus-4
prompt: "Solve: {state.problem}"
thinking:
enabled: true
budget_tokens: 5000
level: high
capture_reasoning: true
output: answer
Reasoning tokens are billed at the output rate. TokenUsage.reasoning_tokens and billable_completion_tokens track the spend; calculate_cost() includes them automatically. See Reasoning models for per-provider details, including the Ollama caveat that eval_count is not split.
Tool calling:
The model can pick tools at runtime. Declare them on the step; the engine dispatches via the workflow's ToolRegistry, feeds results back, and re-prompts until the model stops asking for tools.
- id: ask
type: llm_call
prompt: "What is the user's account balance?"
tools:
- name: lookup_account
description: "Retrieve account info by ID."
parameters:
type: object
properties:
account_id: { type: string }
required: [account_id]
tool_choice: auto # auto | required | none | {name: lookup_account}
max_tool_iterations: 5 # bound the loop; default 5
output: answer
| Field | Type | Default | Description |
|---|---|---|---|
tools |
list[ToolDefinition] |
[] |
Tool declarations the model can pick. parameters is JSON Schema. Names resolve against the registered ToolRegistry; an unknown name is reported back as a tool failure rather than aborting the loop. |
tool_choice |
string \| dict |
"auto" |
"auto" lets the model decide; "required" forces a call; "none" disables tools for this turn; {"name": "..."} pins to a specific tool. Anthropic has no native "none" mode, so when "none" is set the adapter drops tools from the wire entirely — same observable behavior as the other providers. Ollama ignores tool_choice at the wire level (model-side support decides whether a call fires). |
max_tool_iterations |
int |
5 |
Cap on call→result→re-prompt loops. When hit, finish_reason becomes "max_tool_iterations" so callers can detect runaway behavior. |
The dispatched tool runs through the existing sandbox (#105), so http_request, shell_command, file_read, file_write honor the workflow's sandbox: config. Multiple tool calls in one response are dispatched concurrently (anyio task group); results preserve order in the conversation. Cost and tokens accumulate across iterations on the surfaced StepResult.
The legacy tool step (static DAG node, author chooses the tool) keeps working unchanged — tools= on llm_call is the new dynamic, model-driven path.
Retry config:
| Field | Type | Default | Description |
|---|---|---|---|
max_retries |
int |
3 |
Number of retry attempts |
backoff_base |
float |
2.0 |
Exponential backoff base (wait = base^attempt) |
backoff_max |
float |
60.0 |
Maximum wait between retries in seconds |
jitter |
bool |
true |
Apply ±25% jitter to each backoff so concurrent retries don't cluster |
retryable_status_codes |
list[int] |
[429, 500, 502, 503, 504] |
Provider status codes that trigger a retry. Other 4xx (e.g. 400/401/403/404) bail out immediately so the retry budget isn't burned on permanent failures. Status-less exceptions (network errors, generic provider failures) are always treated as transient and retried. |
router¶
Evaluates conditions against state and activates a target step. Steps not activated are skipped.
- id: route
type: router
depends_on: [classify]
conditions:
- expression: "state.classification == 'question'"
target: answer_question
- expression: "state.score > 80"
target: handle_high
default: handle_general # fallback if no condition matches
Allowed in expressions: comparisons, boolean operators (and, or, not), builtins (len, str, int, float, bool, abs, min, max).
Safety
Router expressions are validated via AST and run inside a strict sandbox. The validator rejects:
- imports,
exec, attribute assignment; - any
_-prefixed name (__class__,_private) — blocks dunder traversal and access to private attributes; kwargsand starred arguments in calls — closesformat_map/**vars()exfiltration;- the
typebuiltin — was usable astype(x).__mro__[1].__subclasses__().
Violations raise SecurityError. Only a small audited subset of Python is allowed.
tool¶
Executes a registered tool with author-chosen arguments — the workflow author decides which tool to call, not the model. For model-driven tool selection, use the tools= field on an llm_call step (see tool calling above).
- id: fetch
type: tool
tool_name: http_request # registered tool name
tool_args:
url: "state.api_url" # "state." prefix resolves from state
method: "GET"
headers:
Authorization: "Bearer token"
output: response
depends_on: [previous_step]
Argument resolution
String values starting with state. are resolved from workflow state. Other values are passed as literals.
Placeholder grammar in tool_args¶
tool_args values may also embed {state.foo} / {state[items][0]} / {name} placeholders that get rendered against state via the shared template engine. AgentLoom recognises a placeholder only when the brace is followed by state., state[, or an identifier that ends in }, ![rsa], or : immediately followed by a non-whitespace character:
tool_args:
greeting: "hello {state.user.name}" # rendered
formatted: "cost: {total:.2f}" # rendered (format spec)
raw_inline: '{"k": [1,2,3], "v": true}' # passed through unchanged
raw_html: "<style>.x { color: red; }</style>" # passed through unchanged
raw_js_obj: '{foo: true, bar: false}' # passed through (`: ` whitespace)
If a value happens to look like a placeholder but you need it passed through verbatim, use the per-key escape hatch:
Workflows written before 0.5.0 that relied on { triggering template expansion on raw JSON / HTML strings would have failed at runtime (Max string recursion exceeded or Invalid format specifier) — the narrowed grammar removes that footgun.
Mixed content with embedded placeholders
A string that mixes a real placeholder with raw braces — e.g. '{"user": "{state.user}"}' — still hits the underlying Python str.format_map parser, which interprets the outer { as a format field and raises Invalid format specifier. Two supported workarounds: (1) compose the JSON in two steps and use template: false on the literal half, or (2) escape every literal brace as {{ / }}. AgentLoom cannot disambiguate "intended placeholder" from "intended literal brace" inside the same string. Compact CSS / JS-object shapes like {color:red} (no whitespace after :) are also inherently ambiguous with {name:spec} placeholders — use template: false or escape the braces.
Missing keys in templates¶
The template engine renders missing state references as the empty string in non-strict mode (default), including when the reference chains through several segments, and including chained dunders / conversion flags so a stray {state.missing.__class__} or {state.missing!r} cannot leak object internals:
| Template | State | Renders as |
|---|---|---|
{state.missing} |
{} |
"" |
{state.x.y.z} |
{} |
"" |
{state.missing:.20} |
{} |
"" |
{state.missing!r} |
{} |
"" |
{state.missing.__class__} |
{} |
"" |
{state.user:.20} |
{user: {name: alice}} |
{'name': 'alice'} (format spec ignored on non-scalar, warning logged) |
{{state.x}} |
any | {state.x} (escaped braces) |
The template engine also supports an opt-in strict mode at the programmatic API surface (SafeFormatDict(strict=True) / DotAccessDict(strict=True)) that raises TemplateError at the first missing segment. Strict mode is currently not exposed as a per-step YAML toggle; the runtime defaults to non-strict so missing references render gracefully.
Unicode normalisation
State dict lookup is byte-exact. A workflow that stores a key as NFD-form unicode (café) and references it as NFC (café) renders empty — there is no implicit normalisation step. If your workflow accepts user-supplied keys, normalise to NFC at the boundary (unicodedata.normalize("NFC", key)) before writing them to state.
subworkflow¶
Nests a workflow inside another. By default the child inherits parent state both ways — convenient for trivial helper subworkflows, leaky for anything resembling encapsulation. Set isolated_state: true to opt into a fresh state boundary.
- id: nested
type: subworkflow
isolated_state: true # child cannot read parent state
input: # explicit seed for the child
topic: "{state.user_topic}"
return_keys: [classification, score] # only these surface back via `output:`
workflow_inline:
name: classifier
state: { default_threshold: 0.75 } # child's own state
steps:
- id: classify
type: llm_call
prompt: "Classify: {state.topic}"
output: classification
output: child_result
State contract¶
| Setting | Child sees | Surfaces back |
|---|---|---|
Default (isolated_state: false) |
Full parent state + child's own state: block |
The entire child final state under the parent's output: key |
isolated_state: true, no return_keys |
Child's own state: block + input: mapping |
The entire child final state under the parent's output: key |
isolated_state: true + return_keys: [a, b] |
Child's own state: block + input: mapping |
Only a and b from the child final state |
Pause / resume through nested approval gates¶
A subworkflow containing an approval_gate pauses the parent at a fully-qualified path like sub.gate. The parent workflow status becomes paused (not failed), the checkpoint records paused_step_id: sub.gate, and agentloom resume <parent_run_id> --approve continues through to the next layer after the gate clears — no separate child resume command needed.
Step-id namespace across subworkflows¶
Step ids inside workflow_inline.steps (or in a workflow referenced via workflow_path) live in the child's own namespace — a parent can have id: classify and the child can also have id: classify without collision. Duplicate-id validation is therefore lazy: the parent parse only checks its own top-level steps, and duplicates inside the child are caught when SubworkflowStep executes and re-parses the inline definition (raising Invalid inline subworkflow: ... Duplicate step ids). For workflows where you want eager validation of the entire nested tree, run agentloom validate on the child file separately before referencing it.
Streaming¶
Enable streaming at the workflow level, per-step, or via CLI:
Token usage, cost, and time-to-first-token are tracked during streaming.
Streaming + tools. stream: true is compatible with tools: [...] — the request wire carries the tool spec and the final ProviderResponse returned by StreamResponse.to_provider_response() exposes any tool_calls the model emitted. Per-chunk ToolCallDelta / ToolCallComplete events are not yet surfaced by every adapter (follow-up work); read tool_calls after the stream is exhausted for now.
Sandbox¶
Restrict tool execution with an allowlist-based sandbox:
config:
sandbox:
enabled: true
allowed_commands: [echo, cat, curl]
allowed_paths: [/tmp/work]
readable_paths: [/data]
writable_paths: [/tmp/output]
allow_network: true
allowed_domains: [api.example.com]
allowed_schemes: [https] # restrict URL schemes (default: http, https)
max_write_bytes: 1000000
danger_opt_in: [bash] # opt-in per meta-executable (empty by default)
allow_internal_webhook_targets: false # let approval_gate.notify reach loopback/RFC 1918
| Option | Type | Default | Description |
|---|---|---|---|
enabled |
bool |
false |
Enable sandbox restrictions |
allowed_commands |
list[str] |
[] |
Shell command whitelist |
allowed_paths |
list[str] |
[] |
General file access paths |
readable_paths |
list[str] |
[] |
Read-only paths |
writable_paths |
list[str] |
[] |
Write-allowed paths |
allow_network |
bool |
true |
Allow HTTP/network calls |
allowed_domains |
list[str] |
[] |
Domain whitelist |
allowed_schemes |
list[str] |
["http", "https"] |
URL scheme whitelist (rejects file://, gopher://, etc.) |
max_write_bytes |
int \| null |
null (unlimited) |
Maximum file write size |
danger_opt_in |
list[str] |
[] |
Per-binary opt-in for meta-executables (bash, python, env, xargs, ...). Empty by default — meta-executables defeat the command allowlist by re-launching arbitrary binaries. Add only the names you actually need. |
allow_internal_webhook_targets |
bool |
false |
Permit approval_gate.notify.url to reach loopback / link-local (incl. cloud metadata at 169.254.169.254) / RFC 1918 destinations. Off by default — see Webhook destination gate. |
Meta-executables
Even when bash is in allowed_commands, the sandbox rejects the call unless bash is also listed in danger_opt_in. The opt-in is per-binary, not a global flag — danger_opt_in: ["bash"] does not also enable python. The same gate applies to sh, python, python3, env, xargs, eval, exec. Relative path arguments are validated against the configured cwd; ../ escapes are rejected.
Complete example¶
A classify-and-respond workflow with routing:
name: classify-and-respond
config:
provider: openai
model: gpt-4o-mini
budget_usd: 0.50
state:
user_input: ""
steps:
- id: classify
type: llm_call
system_prompt: "Classify as: question, complaint, or request."
prompt: "Classify: {state.user_input}"
output: classification
- id: route
type: router
depends_on: [classify]
conditions:
- expression: "state.classification == 'question'"
target: answer
default: general_response
- id: answer
type: llm_call
depends_on: [route]
prompt: "Answer: {state.user_input}"
output: response
- id: general_response
type: llm_call
depends_on: [route]
prompt: "Help with: {state.user_input}"
output: response