Changelog¶
All notable changes to this project are documented here. The format follows Keep a Changelog and the project adheres to Semantic Versioning.
[Unreleased]¶
[0.4.0] — 2026-04-15¶
Added¶
agentloom replaysubcommand — re-executes a workflow against a recorded JSON file with no API calls (#61)- Thin alias over
run --mock-responseswith--liteon by default;--stateoverride supported - Works end-to-end with recordings captured via
agentloom run --record
- Thin alias over
- YAML-configured MockProvider — set
provider: mockinWorkflowConfigwithresponses_file,latency_model, andlatency_msfields (#76)- Lets committed fixtures run via plain
agentloom runwithout CLI flags
- Lets committed fixtures run via plain
- Deterministic replay with
MockProviderandRecordingProvider— offline evaluation and reproducible tests (#76)MockProviderloads pre-recorded responses from a JSON file, keyed bystep_idor SHA-256 hash of the messages- Latency models:
constant,normal(seeded gaussian),replay(uses the recordedlatency_ms) RecordingProviderwraps any real provider, captures every completion to JSON, flushes per call so a crash still leaves a partial recording- Merge-on-flush: multiple
RecordingProviderinstances writing to the same path accumulate instead of clobbering each other - CLI flags:
agentloom run --record <file>captures,--mock-responses <file>replays — fully offline, zero network - Prometheus metrics:
agentloom_mock_replays_total{workflow, matched_by},agentloom_recording_captures_total{provider, model},agentloom_recording_latency_seconds - OTel span attributes:
mock.matched_by,recording.provider,recording.latency_s - Grafana dashboard row "Mock & Replay" with hit-ratio, captures by provider, and real-provider latency quantiles
- Validated end-to-end with real Anthropic calls in CLI, Docker, and Kubernetes (byte-identical replay)
- Webhook notifications for approval gates — outbound HTTP on pause (#42)
WebhookConfigonStepDefinition.notifywith URL, custom headers, and JSON body template- Async webhook sender with 3-retry exponential backoff (best-effort, never blocks pause)
agentloom callback-server— lightweight HTTP server for programmatic approve/reject- Routes:
POST /approve/<run_id>,POST /reject/<run_id>,GET /pending - Shared template utilities extracted to
core/templates.py - Grafana dashboard "Human-in-the-Loop" row with approval gate and webhook panels
- Prometheus metrics:
approval_gates_total,webhook_deliveries_total,webhook_latency_seconds - OTel span attributes for approval decisions and webhook delivery
- Example workflow (30), validation script, and K8s smoke job
- Approval gate step type — human-in-the-loop decision point (#41)
StepType.APPROVAL_GATEpauses the workflow and waits for human approval or rejection- Decision injected via
_approval.<step_id>state key on resume --approve/--rejectflags onagentloom resumetimeout_secondsandon_timeoutschema fields (consumed by webhook callback server in #42)- Example workflow (29), validation script, and K8s smoke job
- Workflow pause mechanism — foundation for human-in-the-loop (#40)
PauseRequestedErrorexception for step executors to signal a pauseStepStatus.PAUSEDandWorkflowStatus.PAUSEDstatus values- Engine catches pause requests, saves checkpoint with
status=pausedandpaused_step_id, and returns cleanly - Resume from paused checkpoint skips completed steps and re-runs the paused step
- CLI treats paused workflows as non-error (exit code 0)
- Functional validation script and K8s smoke job
- Pluggable checkpoint backends with
BaseCheckpointerprotocol andFileCheckpointerdefault (#78)CheckpointDatamodel with full workflow state serialization- Engine integration: auto
run_id, checkpoint on completion/failure, graceful I/O error handling WorkflowEngine.from_checkpoint()to reconstruct and resume, skipping completed stepsagentloom run --checkpointand--checkpoint-dirflagsagentloom resume <run_id>andagentloom runsCLI commands- Example workflow (28) and documentation
[0.3.0] — 2026-04-12¶
Added¶
- Documentation site with mkdocs-material — full reference docs auto-deployed to GitHub Pages
- Multi-modal input for
llm_callsteps — images, PDFs, and audio viaattachmentsfield- Provider-native formatting: OpenAI (images, audio), Anthropic (images, PDFs), Google (images, PDFs, audio), Ollama (images)
- URL fetching with
fetch: local(default) orfetch: providerpassthrough - SSRF protection: blocks private/reserved IP ranges (RFC 1918, loopback, link-local)
- Sandbox integration:
allowed_domains,allow_network, andreadable_pathsenforced - Grafana dashboard "Multi-modal" row with attachment panels
- Multi-modal workflow examples (19-24)
- Streaming for LLM responses with real-time token output
StreamResponseaccumulator with per-provider SSE/NDJSON parsing- All 4 providers: OpenAI (SSE), Anthropic (SSE), Google (SSE), Ollama (NDJSON)
- Gateway
stream()with circuit breaker + rate limiter integration config.stream: true(workflow-level) and per-stepstream:override- CLI
--streamflag,time_to_first_token_msinStepResult - Grafana "Streaming" dashboard row with TTFT quantiles
- Streaming examples (25-26)
AGENTLOOM_*env var prefix for all configuration overrides- YAML-based pricing table replacing hardcoded Python dict
- Provider auto-discovery moved from CLI hack to
config.discover_providers() - Ollama e2e integration tests against a live Docker instance
- Array index support in state paths (
state.items[0],results[-1]) - First-class graph API for workflow DAG analysis and export
WorkflowGraphclass with path algorithms and export formats- Graphviz DOT, Mermaid, PNML, NetworkX, JSON
- Test coverage reporting via Codecov with 85% minimum threshold
[0.2.0] — 2026-03-30¶
Added¶
- Kubernetes manifests with Kustomize overlays for dev, staging, and production
- Helm chart with Job/CronJob modes and render-time input validation
- Terraform configuration for local kind cluster with full observability stack
- ArgoCD Application CRD with automated sync and Job immutability handling
- Docker CI/CD workflow for multi-arch GHCR publishing
- Infrastructure documentation
Fixed¶
- Production NetworkPolicy OTel egress restricted to observability namespace
- Read-only filesystem audit check no longer false-passes
- Terraform audit phase passes KUBECONFIG to all kubectl poll commands
- GitHub Actions and image versions pinned to commit SHAs
[0.1.2] — 2026-03-26¶
Added¶
- Sandbox enforcement for built-in tools — command allowlist, path restrictions, network domain filtering, shell operator injection prevention, write size limits
SandboxConfigmodel in workflow YAML (config.sandbox.*)- Sandbox workflow examples (17, 18)
Fixed¶
- Step executors now use
await get_state_snapshot()instead of sync.stateaccess - Removed deprecated
gemini-2.0-flashmodel
[0.1.1] — 2026-03-22¶
Fixed¶
- Rate limiter now accounts for response tokens, not just prompt tokens
- README header image uses absolute URLs for PyPI compatibility
[0.1.0] — 2026-03-19¶
First public release.
Added¶
- YAML and Python DSL workflow definitions (DAGs with sequential + parallel steps)
- Step types:
llm_call,tool,router(conditional),subworkflow - Provider gateway with automatic fallback (OpenAI, Anthropic, Google, Ollama)
- Circuit breaker, rate limiter, and retry with exponential backoff per provider
- Budget enforcement (hard stop when USD limit exceeded)
- Cost tracking per step, model, and provider
- OpenTelemetry traces + Prometheus metrics (optional)
- CLI commands:
run,validate,visualize(ASCII + Mermaid),info - Checkpointing: save and resume workflow state to disk
Design Decisions¶
- httpx over provider SDKs — keeps dependencies minimal (~5 core)
- anyio over raw asyncio — structured concurrency via task groups
- str.format_map over Jinja2 — one fewer dependency; prompt templates don't need loops
- Observability optional — core runs without opentelemetry or prometheus
- Pydantic v2 — validation and serialization worth the compilation trade-off