Skip to content

Error Handling Guide

Reliable error handling is one of the main reasons to use Orchestrix.

The goal is not just to fail. The goal is to fail in a way that is understandable, recoverable when appropriate, and safe for the rest of the system.

Types of failures

It helps to think about failures in three groups:

  1. Validation failures: the input is invalid before execution should even begin
  2. Transient execution failures: the step might succeed if retried
  3. Permanent execution failures: the step is not going to succeed without changing the input or system state

Choosing the right response depends on which group you are dealing with.

Validation failures

If you configure a schema, Orchestrix can reject bad input before any step runs.

This is the cleanest kind of failure because no side effects have started yet.

Transient failures

Transient failures are a good match for retries:

ts
flow.step("call-external-api", async () => {
  // ...
}, {
  retries: 3
});

Typical transient failures include:

  • temporary network issues
  • short service overloads
  • brief database contention

Permanent failures

Permanent failures should usually fail fast.

Examples:

  • invalid business state
  • missing required domain data
  • rejected payment for a business rule reason
  • malformed input that slipped past validation

Retries do not help here and only make failure slower and less clear.

Global failure observation

Use hooks to observe failures centrally:

ts
const flow = create("my-flow", {
  hooks: {
    onFlowFail: ({ flowName, result }) => {
      Sentry.captureException(result, {
        extra: { flowName }
      });
    }
  }
});

This is useful when you want consistent failure reporting without repeating logic in every step.

Inspecting the result

The returned FlowResult is the main place to inspect execution failure:

ts
const result = await flow.run(input);

if (result.status === "failed") {
  const failedStep = result.steps.find((s) => s.status === "failed");
  console.log(result.error);
  console.log(failedStep?.name);
  console.log(failedStep?.attempts);
}

This lets you answer:

  • what failed?
  • where did it fail?
  • how many attempts were made?
  • how much time was spent before failure?

Timeouts

Timeouts are part of error handling because they stop steps from waiting forever.

ts
flow.step("slow-process", async () => {
  // ...
}, {
  timeoutMs: 5000
});

A timeout becomes a step failure. If retries are configured, that failure may trigger another attempt.

Compensation and failure

If a step fails after earlier steps already completed successfully, compensation is what restores consistency.

That means error handling in Orchestrix is not only about detection. It is also about safe cleanup.

Best Practices

  • Validate early to fail before side effects start.
  • Retry only errors that are truly transient.
  • Set timeouts for external operations.
  • Add compensation for steps that reserve, charge, or create durable side effects.
  • Inspect FlowResult in your application layer instead of treating failure as a black box.

Released under the MIT License.