Retries and Timeouts

Retries and timeouts are two of the most important reliability controls in Orchestrix.

They are best understood together, because one controls how long an attempt may run and the other controls whether another attempt should happen.

Why they belong together

If a step has retries but no timeout, one attempt may hang for too long before a retry is even considered.

If a step has a timeout but no retries, a short-lived issue may cause immediate failure even though a second attempt would have worked.

Combining both gives you bounded attempts and controlled recovery.

Configuring retries

Retries are configured per step:

{
  retries: 3,
  retryDelayMs: 200,
  backoffFactor: "linear"
}

Fixed

Use when you want a stable delay between attempts.

Linear

Use when you want retries to become gradually less aggressive.

Exponential

Use when a downstream system may need more time to recover:

{
  retries: 5,
  retryDelayMs: 1000,
  backoffFactor: "exponential"
}

Understanding timeouts

A timeout sets the maximum duration for a single attempt:

flow.step("heavy-query", async () => {
  // ...
}, {
  timeoutMs: 30000
});

If the attempt exceeds that limit, Orchestrix treats it as a failure.

Timeout semantics

The important detail is that timeout applies per attempt, not to the entire flow.

So this configuration:

flow.step("api-call", async () => {
  // ...
}, {
  timeoutMs: 5000,
  retries: 3
});

means:

attempt 1 may run up to 5 seconds
if it times out, attempt 2 may still happen
retries continue until the configured retry count is exhausted

Choosing good values

Good retry and timeout values depend on:

user expectations
downstream system behavior
request deadlines
infrastructure cost

For example:

a user-facing checkout flow usually needs tighter limits
an async background reconciliation flow may tolerate longer retry windows

Practical guidance

Use lower timeouts when:

the call is user-facing
the dependency is usually fast
waiting too long harms the product experience

Use higher timeouts when:

the operation is expensive but expected
it runs in background processing
retries would be more harmful than waiting slightly longer

Example: remote API call

flow.step("call-billing-api", async () => {
  // ...
}, {
  timeoutMs: 4000,
  retries: 4,
  retryDelayMs: 500,
  backoffFactor: "exponential",
  jitter: true,
  maxRetryDelayMs: 8000
});

This is a good production-style configuration because it:

bounds each attempt
retries transient failures
backs off when the dependency is unhealthy
avoids synchronized retry bursts

Best Practices

Put timeouts on all network-heavy steps.
Use retries only where another attempt has a real chance of succeeding.
Prefer exponential backoff for external systems.
Keep total retry time aligned with the workflow's business deadline.

Retries and Timeouts ​

Why they belong together ​

Configuring retries ​

Fixed ​

Linear ​

Exponential ​

Understanding timeouts ​

Timeout semantics ​

Choosing good values ​

Practical guidance ​

Example: remote API call ​

Best Practices ​