Retries

Retries are how Orchestrix handles transient failures automatically.

In real systems, many failures are temporary: network hiccups, overloaded downstream services, short-lived database contention, or third-party API instability. Retrying these operations can turn a fragile flow into a resilient one.

When retries help

Retries are a good fit for:

HTTP calls to external services
queue or broker interactions
short-lived infrastructure issues
database operations that may fail transiently

Retries are a poor fit for:

validation failures
business rule violations
malformed input
permanent authorization errors

Basic usage

You can enable retries for any step:

flow.step("fetch-data", async () => {
  return await api.getData();
}, {
  retries: 3
});

If the step fails, Orchestrix attempts to run it again according to the configured policy.

Retry configuration

flow.step("call-api", async () => {
  // ...
}, {
  retries: 5,
  retryDelayMs: 1000,
  backoffFactor: "exponential",
  jitter: true,
  maxRetryDelayMs: 10000
});

Backoff strategies

Orchestrix supports three delay strategies:

Fixed

Uses the same delay for each retry.

{
  retries: 3,
  retryDelayMs: 1000,
  backoffFactor: "fixed"
}

Linear

Increases delay linearly with each retry.

{
  retries: 3,
  retryDelayMs: 1000,
  backoffFactor: "linear"
}

Exponential

Increases delay more aggressively to reduce pressure on struggling systems.

{
  retries: 5,
  retryDelayMs: 1000,
  backoffFactor: "exponential"
}

Jitter

Jitter adds randomness to retry delays so many clients do not retry at the exact same time.

This is especially useful in production environments where multiple workers or servers may fail together and recover together.

Timeouts and retries

Retries become much more useful when paired with timeouts.

Without a timeout, a step can hang for a long time before a retry even becomes relevant. With a timeout, each attempt has a clear upper bound.

What retries do not change

Retries do not make permanent errors disappear.

If the underlying problem is not transient, a retry policy only adds latency before failure. This is why retry decisions should be based on the kind of failure the step is expected to see.

Best Practices

Retry transient infrastructure failures, not business failures.
Use exponential backoff for remote dependencies.
Add jitter in distributed or high-throughput systems.
Pair retries with timeoutMs on network-heavy steps.
Keep the total retry window aligned with your product and operational expectations.

Retries ​

When retries help ​

Basic usage ​

Retry configuration ​

Backoff strategies ​

Fixed ​

Linear ​

Exponential ​

Jitter ​

Timeouts and retries ​

What retries do not change ​

Best Practices ​