Error Handling Guide
Reliable error handling is one of the main reasons to use Orchestrix.
The goal is not just to fail. The goal is to fail in a way that is understandable, recoverable when appropriate, and safe for the rest of the system.
Types of failures
It helps to think about failures in three groups:
- Validation failures: the input is invalid before execution should even begin
- Transient execution failures: the step might succeed if retried
- Permanent execution failures: the step is not going to succeed without changing the input or system state
Choosing the right response depends on which group you are dealing with.
Validation failures
If you configure a schema, Orchestrix can reject bad input before any step runs.
This is the cleanest kind of failure because no side effects have started yet.
Transient failures
Transient failures are a good match for retries:
flow.step("call-external-api", async () => {
// ...
}, {
retries: 3
});Typical transient failures include:
- temporary network issues
- short service overloads
- brief database contention
Permanent failures
Permanent failures should usually fail fast.
Examples:
- invalid business state
- missing required domain data
- rejected payment for a business rule reason
- malformed input that slipped past validation
Retries do not help here and only make failure slower and less clear.
Global failure observation
Use hooks to observe failures centrally:
const flow = create("my-flow", {
hooks: {
onFlowFail: ({ flowName, result }) => {
Sentry.captureException(result, {
extra: { flowName }
});
}
}
});This is useful when you want consistent failure reporting without repeating logic in every step.
Inspecting the result
The returned FlowResult is the main place to inspect execution failure:
const result = await flow.run(input);
if (result.status === "failed") {
const failedStep = result.steps.find((s) => s.status === "failed");
console.log(result.error);
console.log(failedStep?.name);
console.log(failedStep?.attempts);
}This lets you answer:
- what failed?
- where did it fail?
- how many attempts were made?
- how much time was spent before failure?
Timeouts
Timeouts are part of error handling because they stop steps from waiting forever.
flow.step("slow-process", async () => {
// ...
}, {
timeoutMs: 5000
});A timeout becomes a step failure. If retries are configured, that failure may trigger another attempt.
Compensation and failure
If a step fails after earlier steps already completed successfully, compensation is what restores consistency.
That means error handling in Orchestrix is not only about detection. It is also about safe cleanup.
Best Practices
- Validate early to fail before side effects start.
- Retry only errors that are truly transient.
- Set timeouts for external operations.
- Add compensation for steps that reserve, charge, or create durable side effects.
- Inspect
FlowResultin your application layer instead of treating failure as a black box.