Deployment¶
AgentLoom runs anywhere — from a single Docker container to a fully orchestrated Kubernetes cluster with GitOps and observability. The CLI processes a workflow and exits; there is no long-running server.
Why Jobs, not Deployments?
AgentLoom is a CLI that exits after processing. Kubernetes Jobs provide finite execution, automatic retries via backoffLimit, scheduled runs via CronJobs, and clean resource isolation per workflow.
Docker¶
The multi-stage Dockerfile produces a minimal image (~120MB) with a non-root user and read-only filesystem.
# Build
docker build -t agentloom .
# Run a workflow
docker run --rm \
-e OPENAI_API_KEY=sk-... \
-v ./examples:/workflows:ro \
agentloom run /workflows/01_simple_qa.yaml
# Override provider and model
docker run --rm \
-e OPENAI_API_KEY=sk-... \
-v ./examples:/workflows:ro \
agentloom run /workflows/01_simple_qa.yaml --provider openai --model gpt-4o-mini
# Build with observability extras
docker build --build-arg BUILD_OBSERVABILITY=true -t agentloom:obs .
Docker Compose¶
Runs AgentLoom alongside the full observability stack:
cd deploy
# Start observability stack
docker compose up -d
# Run a workflow (on-demand, not part of `up`)
docker compose run --rm agentloom run /workflows/01_simple_qa.yaml
# Stop everything
docker compose down
| Service | Port | Purpose |
|---|---|---|
| Grafana | 3000 | Dashboard visualization (admin/admin) |
| Prometheus | 9090 | Metrics storage and PromQL queries |
| Jaeger | 16686 | Distributed tracing UI |
| OTel Collector | 4317 | Receives traces and metrics via gRPC |
Kubernetes (Kustomize)¶
Plain YAML manifests organized with Kustomize overlays. Three environments provided:
| Overlay | Image Tag | Memory (req/limit) | CPU Request | NetworkPolicy | Deadline |
|---|---|---|---|---|---|
| dev | latest |
64Mi / 128Mi | 50m | Disabled | — |
| staging | CI SHA | 128Mi / 256Mi | 100m | Enabled | — |
| production | Pinned | 256Mi / 512Mi | 200m | Strict (no Ollama) | 600s |
# Deploy dev overlay
kubectl apply -k deploy/k8s/overlays/dev
kubectl logs job/agentloom-workflow -n agentloom
# Clean up
kubectl delete -k deploy/k8s/overlays/dev
Secrets
The base secret ships with data: {} by design. Populate it before deploying:
No CPU limits
No overlay sets CPU limits — this is intentional. CPU limits cause CFS throttling during LLM API calls (which are I/O-bound, not CPU-bound). CPU requests are set for scheduling guarantees.
Helm¶
Recommended for parameterized deployments. The chart validates inputs at render time — deploying without a workflow definition fails at render time, not at runtime.
Helm chart reference
| Parameter | Default | Description |
|---|---|---|
image.repository |
ghcr.io/cchinchilla-dev/agentloom |
Container image repository |
image.tag |
"" |
Image tag (defaults to appVersion) |
workflow.definition |
"" |
Inline workflow YAML |
workflow.existingConfigMap |
"" |
Reference to existing ConfigMap |
workflow.args |
[] |
Extra CLI arguments |
schedule.enabled |
false |
false = Job, true = CronJob |
schedule.cron |
0 */6 * * * |
CronJob schedule |
schedule.timeZone |
UTC |
CronJob timezone |
provider.existingSecret |
"" |
Pre-created Secret with API keys |
provider.openaiApiKey |
"" |
OpenAI key (not recommended) |
observability.enabled |
true |
Enable OTel endpoint env var |
observability.otelEndpoint |
http://otel-collector:4317 |
OTel Collector endpoint |
job.backoffLimit |
2 |
Retry attempts on failure |
job.ttlSecondsAfterFinished |
3600 |
Cleanup delay after completion |
job.activeDeadlineSeconds |
null |
Hard timeout |
serviceAccount.create |
true |
Create a ServiceAccount |
serviceAccount.automountServiceAccountToken |
false |
Mount K8s API token |
namespace.create |
false |
Create namespace with PSS labels |
networkPolicy.enabled |
true |
Enable NetworkPolicy |
networkPolicy.allowOllama |
true |
Allow egress to Ollama (port 11434) |
resourceQuota.enabled |
false |
Enable namespace ResourceQuota |
Validation: The chart fails at render time if neither workflow.definition nor workflow.existingConfigMap is set, or if both are set simultaneously.
Jobs and immutability: Kubernetes Jobs are immutable after creation. Use helm uninstall + helm install for changes, or use ArgoCD with Replace=true.
Terraform¶
Provisions a complete local development environment in one command: kind cluster + AgentLoom + observability stack.
When enable_observability = true (default), Terraform deploys the full stack:
| Component | URL | Purpose |
|---|---|---|
| OTel Collector | gRPC :4317 | Receives traces and metrics |
| Jaeger | http://localhost:16686 | Trace storage and UI |
| Prometheus | http://localhost:9090 | Metrics scraping |
| Grafana | http://localhost:3000 | Dashboards (admin/admin) |
Set enable_observability = false for a lightweight setup without the observability stack.
ArgoCD¶
GitOps deployment with automated sync, self-heal, and retry policies. ArgoCD watches the Helm chart in the repository and syncs changes automatically.
The Application CRD configures:
- Automated sync with prune and self-heal
- Replace=true for immutable resources (K8s Jobs cannot be patched in-place)
- ignoreDifferences on Job selector/labels to prevent perpetual out-of-sync
- Retry policy: 5 attempts with exponential backoff (5s -> 3m max)
- CreateNamespace: the
agentloomnamespace is created if it does not exist
Secrets
Secrets are not managed by ArgoCD. Create them manually or use External Secrets Operator / Sealed Secrets / SOPS.
Environment variables¶
| Variable | Description | Default |
|---|---|---|
OPENAI_API_KEY |
OpenAI API key | — |
ANTHROPIC_API_KEY |
Anthropic API key | — |
GOOGLE_API_KEY |
Google AI API key | — |
OLLAMA_BASE_URL |
Ollama server URL | http://localhost:11434 |
OTEL_EXPORTER_OTLP_ENDPOINT |
OTel Collector gRPC endpoint | http://localhost:4317 |
In Kubernetes, these are injected via envFrom referencing a Secret. Never bake API keys into images or commit them to source control.
CI/CD pipeline¶
| Workflow | Trigger | What it does |
|---|---|---|
ci.yml |
Push / PR to main |
Lint, type check, test, Docker build, K8s validation, Helm lint |
docker.yml |
Push to main / tags v* |
Build multi-arch image, push to GHCR |
release.yml |
Tags v* |
Build wheel, GitHub release, publish to PyPI |
docs.yml |
Push to main (docs/mkdocs.yml) |
Build and deploy documentation to GitHub Pages |
e2e-ollama.yml |
Weekly / release/** / e2e label |
Ollama integration tests against live Docker instance |
pr-labeler.yml |
PR opened/updated | Auto-label PRs based on file paths |
labels.yml |
Push to main (labels.json) |
Sync GitHub labels from config |
stale.yml |
Weekly (Monday 9am UTC) | Mark/close stale issues |
Image tags¶
| Event | Tag pattern | Example |
|---|---|---|
Push to main |
main-<sha> |
main-a1b2c3d |
Tag v1.2.3 |
1.2.3, 1.2, latest |
1.2.3 |
| Pull request | pr-<number> |
pr-42 |
Images are published to ghcr.io/cchinchilla-dev/agentloom.
Security¶
All workloads enforce the Kubernetes Pod Security Standards "restricted" profile:
- Non-root container — runs as
agentloom(UID 1000), enforced via Dockerfile andsecurityContext.runAsNonRoot - Read-only root filesystem —
readOnlyRootFilesystem: truewith writable/tmpviaemptyDir - No privilege escalation —
allowPrivilegeEscalation: false, all Linux capabilities dropped - seccomp —
RuntimeDefaultprofile blocks dangerous syscalls - No API server access —
automountServiceAccountToken: false - NetworkPolicy — restricts egress to DNS (kube-system), HTTPS (443), and OTel (4317). Production removes Ollama (11434). All ingress denied
- Secrets management — API keys stored in K8s Secrets, never baked into images
- Supply chain security — all GitHub Actions pinned to commit SHAs, not mutable tags
- ResourceQuota — optional namespace-level limits to prevent resource exhaustion
- PSS namespace labels —
pod-security.kubernetes.io/enforce: restricted