Sagas
Sagas orchestrate long-running workflows across multiple aggregates. Unlike implications (which react automatically), sagas are explicit: a trigger event starts them, they execute steps in order, and they handle failure with compensation.
When to use sagas
Good for: - Multi-step business processes (checkout, onboarding, provisioning) - Operations spanning external services (payment APIs, email, webhooks) - Processes needing human approval - Anything where partial failure requires rollback
Not for: - Simple single-aggregate updates (use regular events) - Automatic reactions to events (use implications) - Single delayed actions (use scheduled events)
You might not need a saga
If it's a single delayed action -- send a reminder in 24 hours, expire a token next week -- use a scheduled event. Sagas are for multi-step workflows where steps depend on each other and failures need to unwind previous work.
Defining sagas
Sagas are defined in modules.sagas in your spec:
{
"modules": {
"sagas": {
"checkout": {
"trigger": {
"aggregate_type": "cart",
"event_type": "checkout_started"
},
"steps": [
{
"name": "reserve_inventory",
"emit": {
"aggregate_type": "inventory",
"id": "$.data.product_id",
"event_type": "reservation_requested",
"data": {
"quantity": "$.data.quantity"
}
},
"await": {
"aggregate_type": "inventory",
"event_types": ["was_reserved", "reservation_failed"]
},
"compensate": {
"aggregate_type": "inventory",
"id": "$.data.product_id",
"event_type": "reservation_released",
"data": {}
},
"timeout_ms": 30000
},
{
"name": "charge_payment",
"condition": {"equals": ["$prev.type", "was_reserved"]},
"emit": {
"aggregate_type": "payment",
"id": "$.data.payment_id",
"event_type": "charge_requested",
"data": {"amount": "$.data.total"}
},
"await": {
"aggregate_type": "payment",
"event_types": ["was_charged", "charge_failed"]
},
"compensate": {
"aggregate_type": "payment",
"id": "$.data.payment_id",
"event_type": "refund_requested",
"data": {}
}
},
{
"name": "notify",
"condition": {"equals": ["$prev.type", "was_charged"]},
"emit": {
"aggregate_type": "notification",
"id": "$.data.customer_id",
"event_type": "order_confirmation_sent",
"data": {
"transaction_id": "$prev.data.transaction_id"
}
}
}
],
"on_complete": {
"aggregate_type": "order",
"id": "$.data.order_id",
"event_type": "checkout_completed",
"data": {}
},
"on_failed": {
"aggregate_type": "order",
"id": "$.data.order_id",
"event_type": "checkout_failed",
"data": {"error": "$error.message"}
}
}
}
}
}
Trigger
"trigger": {
"aggregate_type": "cart",
"event_type": "checkout_started"
}
When checkout_started is written to any cart, this saga starts.
Steps
Each step has a name, an event to emit, and optionally:
- await - event types to wait for (step completes when one arrives)
- compensate - event to emit if a later step fails
- timeout_ms - how long to wait (required for steps with await)
- condition - only run if true
Referencing data
Templates pull data from the trigger event and previous steps. Sagas have access to all standard event paths ($.key, $.id, $.type, $.data.*, $.metadata.*, @.*) plus saga-specific context:
| Template | Description |
|---|---|
$prev.type |
Previous step's response type |
$prev.data.* |
Previous step's response data |
$context.<step>.* |
Earlier step's result by name |
$error.message |
Error message (in on_failed) |
$error.step |
Failed step name (in on_failed) |
@.* gives you the trigger aggregate's computed state at the moment the saga was created. This is a snapshot — if the aggregate changes after the saga starts, steps still see the original state. Use this when a saga step needs data from the aggregate that wasn't part of the trigger event (e.g., a customer_id that was set at creation time but isn't in the had_followup_converted event). ($.state.* is a deprecated alias for @.*.)
Type preservation. Templates pass values through without type coercion. If @.customer_id resolves to an object but the target schema expects "type": "string", the step will fail with a type mismatch. Make sure the data types in your aggregate state match what the target event schema declares.
Optional fields. Append ? to any template path to make it optional. If the path resolves to null, the key is omitted from the emitted event data instead of being set to null. This is useful when the trigger event may or may not include a field:
"data": {
"customer_id": "$.data.customer_id",
"is_tax_exempt": "$.data.is_tax_exempt?"
}
If is_tax_exempt isn't present in the trigger event, it's simply left out of the emitted event rather than being set to null (which would fail a "type": "boolean" schema check).
The condition on charge_payment checks that inventory was actually reserved before charging. If the previous step returned reservation_failed, this step is skipped.
How sagas execute
- Trigger event is written, saga starts
- Execute step 1: Emit
inventory.reservation_requested - Wait: For a matching
awaitevent or timeout - On success: Proceed to step 2
- On failure: Run compensations in reverse order, mark saga failed
Each step emits its event, then waits for a response matching one of the await.event_types. If no response arrives within timeout_ms, the step fails.
Steps without await
Steps without await succeed immediately after emitting. Use this for fire-and-forget actions like logging:
{
"name": "audit_log",
"emit": {
"aggregate_type": "audit",
"id": "global",
"event_type": "checkout_attempted",
"data": {"cart_id": "$.key"}
}
}
Compensation
When a step fails, the saga walks backward through completed steps and emits their compensate events:
Step 3 fails
-> Compensate step 2
-> Compensate step 1
-> Saga marked failed
Steps without compensate are skipped during rollback (like notify -- nothing to undo).
If compensation itself fails after 3 attempts (max_compensation_attempts), the saga is dead-lettered for manual review.
Lifecycle events
on_complete fires when all steps succeed. on_failed fires after compensation completes. Both are optional.
Saga state
Query saga status via the admin API:
GET /_admin/sagas/:id
Authorization: Bearer <operator-jwt>
Response:
{
"saga_id": "saga_abc123",
"type": "checkout",
"status": "running",
"current_step": "charge_payment",
"started_at": 1705312800,
"steps": [
{"name": "reserve_inventory", "status": "completed", "completed_at": 1705312801},
{"name": "charge_payment", "status": "waiting", "waiting_since": 1705312802}
]
}
Admin API
GET /_admin/sagas?status=running # List sagas
GET /_admin/sagas/:id # Get saga details
POST /_admin/sagas/:id/retry # Retry failed saga
Filter by status (running, completed, failed, dead_lettered), environment, limit, offset.
Retry picks up where it left off, restarting from the failed step. The saga runner polls for work every 1000ms.
Dead letters
If compensation fails 3 times, the saga is dead-lettered. This requires manual intervention:
- Check the dashboard (dead letters appear prominently on the home page)
- Fix the underlying issue
- Retry via the admin API
Example: User onboarding
{
"modules": {
"sagas": {
"user_onboarding": {
"trigger": {
"aggregate_type": "user",
"event_type": "was_created"
},
"steps": [
{
"name": "send_welcome_email",
"emit": {
"aggregate_type": "email",
"id": "$.data.user_id",
"event_type": "was_queued",
"data": {"template": "welcome", "email": "$.data.email"}
}
},
{
"name": "provision_trial",
"emit": {
"aggregate_type": "subscription",
"id": "$.data.user_id",
"event_type": "trial_was_started",
"data": {"plan": "starter"}
},
"compensate": {
"aggregate_type": "subscription",
"id": "$.data.user_id",
"event_type": "trial_was_cancelled",
"data": {}
}
}
]
}
}
}
}
Compared to implications
| Implications | Sagas | |
|---|---|---|
| Trigger | Automatic on event | Trigger event in spec |
| Scope | Event chain (atomic) | Multi-step workflow |
| Compensation | Manual | Built-in |
| Human interaction | No | Yes |
| Failure handling | Retry | Compensation + retry |
| Use case | Simple reactions | Complex processes |
Use implications for if-this-then-that. Use sagas for processes requiring coordination and rollback.
Limitations
- Max 50 steps per saga
- Max 100 concurrent sagas per instance
- Max 3 compensation attempts before dead-lettering
- Saga chain depth limit: 3. A saga step can emit an event that triggers another saga, which can trigger a third. Beyond depth 3, saga and scheduled event hooks are skipped and an audit log entry is recorded. If you see
saga_depth_exceededin your audit log, simplify your saga chain or consolidate steps. Listener/webhook hooks are unaffected by the depth limit.
Best practices
Keep steps small. Each step should do one thing. Easier to compensate, easier to debug.
Design compensations first. Before writing the action, know how to undo it.
Set realistic timeouts. Account for external API latency, human response time.
Monitor compensation rate. High compensation means the process design needs rethinking.
Troubleshooting
emit_failed: type mismatch
The most common saga step failure. The step error looks like:
{
"message": "emit_failed",
"type": "validation_error",
"detail": "type mismatch: expected string, got object",
"path": "data.customer_id"
}
This means the value resolved by the template doesn't match the target event's schema type. The path tells you which property failed, and detail tells you what was expected vs. what was provided.
Common causes:
- A merge handler stored an object where a string was expected. For example, if @.customer_id resolves to {"name": "Alice", "id": "abc"} instead of "abc", a schema expecting "type": "string" will reject it.
- An integer in state is being passed to a schema expecting "type": "string". Templates do not coerce types.
- A set handler with "value": "$.data" copied the entire event payload into a field, creating an object where a scalar was expected.
To diagnose:
1. Check the step error via the admin API: GET /_admin/sagas/:id
2. Look at error.path to identify which field failed
3. Look at error.detail for the expected vs. actual type
4. Check the trigger aggregate's state to see what the template actually resolved to
To fix: adjust the handler that builds the source aggregate state so the field has the correct type, or adjust the target event schema to accept the type being provided.
emit_failed: missing required property
{
"message": "emit_failed",
"type": "validation_error",
"detail": "missing required property",
"path": "data"
}
A template resolved to null because the path doesn't exist in the source data. For example, @.email returns null if the trigger aggregate's state doesn't have an email field, and the resulting event data omits the key entirely.
To fix: verify that the trigger event or aggregate state always includes the fields your saga steps reference, or make the field optional in the target schema.
General debugging
- Inspect the saga:
GET /_admin/sagas/:idshows each step's status and error details - Check step errors: failed steps include
type,detail, andpathfields that identify exactly what went wrong - Verify template sources: use the aggregates endpoint to confirm what
@.*templates resolve to at runtime - Check schema alignment: ensure the data types in your source (trigger event data, aggregate state) match the declared types in the target event schema
See also
- Implications guide - Simple reactive chains
- Scheduled events - Delayed execution