Skip to content

Retries and Timeouts

Retries and timeouts are two of the most important reliability controls in Orchestrix.

They are best understood together, because one controls how long an attempt may run and the other controls whether another attempt should happen.

Why they belong together

If a step has retries but no timeout, one attempt may hang for too long before a retry is even considered.

If a step has a timeout but no retries, a short-lived issue may cause immediate failure even though a second attempt would have worked.

Combining both gives you bounded attempts and controlled recovery.

Configuring retries

Retries are configured per step:

ts
{
  retries: 3,
  retryDelayMs: 200,
  backoffFactor: "linear"
}

Fixed

Use when you want a stable delay between attempts.

Linear

Use when you want retries to become gradually less aggressive.

Exponential

Use when a downstream system may need more time to recover:

ts
{
  retries: 5,
  retryDelayMs: 1000,
  backoffFactor: "exponential"
}

Understanding timeouts

A timeout sets the maximum duration for a single attempt:

ts
flow.step("heavy-query", async () => {
  // ...
}, {
  timeoutMs: 30000
});

If the attempt exceeds that limit, Orchestrix treats it as a failure.

Timeout semantics

The important detail is that timeout applies per attempt, not to the entire flow.

So this configuration:

ts
flow.step("api-call", async () => {
  // ...
}, {
  timeoutMs: 5000,
  retries: 3
});

means:

  • attempt 1 may run up to 5 seconds
  • if it times out, attempt 2 may still happen
  • retries continue until the configured retry count is exhausted

Choosing good values

Good retry and timeout values depend on:

  • user expectations
  • downstream system behavior
  • request deadlines
  • infrastructure cost

For example:

  • a user-facing checkout flow usually needs tighter limits
  • an async background reconciliation flow may tolerate longer retry windows

Practical guidance

Use lower timeouts when:

  • the call is user-facing
  • the dependency is usually fast
  • waiting too long harms the product experience

Use higher timeouts when:

  • the operation is expensive but expected
  • it runs in background processing
  • retries would be more harmful than waiting slightly longer

Example: remote API call

ts
flow.step("call-billing-api", async () => {
  // ...
}, {
  timeoutMs: 4000,
  retries: 4,
  retryDelayMs: 500,
  backoffFactor: "exponential",
  jitter: true,
  maxRetryDelayMs: 8000
});

This is a good production-style configuration because it:

  • bounds each attempt
  • retries transient failures
  • backs off when the dependency is unhealthy
  • avoids synchronized retry bursts

Best Practices

  • Put timeouts on all network-heavy steps.
  • Use retries only where another attempt has a real chance of succeeding.
  • Prefer exponential backoff for external systems.
  • Keep total retry time aligned with the workflow's business deadline.

Released under the MIT License.