Retries and Timeouts
Retries and timeouts are two of the most important reliability controls in Orchestrix.
They are best understood together, because one controls how long an attempt may run and the other controls whether another attempt should happen.
Why they belong together
If a step has retries but no timeout, one attempt may hang for too long before a retry is even considered.
If a step has a timeout but no retries, a short-lived issue may cause immediate failure even though a second attempt would have worked.
Combining both gives you bounded attempts and controlled recovery.
Configuring retries
Retries are configured per step:
{
retries: 3,
retryDelayMs: 200,
backoffFactor: "linear"
}Fixed
Use when you want a stable delay between attempts.
Linear
Use when you want retries to become gradually less aggressive.
Exponential
Use when a downstream system may need more time to recover:
{
retries: 5,
retryDelayMs: 1000,
backoffFactor: "exponential"
}Understanding timeouts
A timeout sets the maximum duration for a single attempt:
flow.step("heavy-query", async () => {
// ...
}, {
timeoutMs: 30000
});If the attempt exceeds that limit, Orchestrix treats it as a failure.
Timeout semantics
The important detail is that timeout applies per attempt, not to the entire flow.
So this configuration:
flow.step("api-call", async () => {
// ...
}, {
timeoutMs: 5000,
retries: 3
});means:
- attempt 1 may run up to 5 seconds
- if it times out, attempt 2 may still happen
- retries continue until the configured retry count is exhausted
Choosing good values
Good retry and timeout values depend on:
- user expectations
- downstream system behavior
- request deadlines
- infrastructure cost
For example:
- a user-facing checkout flow usually needs tighter limits
- an async background reconciliation flow may tolerate longer retry windows
Practical guidance
Use lower timeouts when:
- the call is user-facing
- the dependency is usually fast
- waiting too long harms the product experience
Use higher timeouts when:
- the operation is expensive but expected
- it runs in background processing
- retries would be more harmful than waiting slightly longer
Example: remote API call
flow.step("call-billing-api", async () => {
// ...
}, {
timeoutMs: 4000,
retries: 4,
retryDelayMs: 500,
backoffFactor: "exponential",
jitter: true,
maxRetryDelayMs: 8000
});This is a good production-style configuration because it:
- bounds each attempt
- retries transient failures
- backs off when the dependency is unhealthy
- avoids synchronized retry bursts
Best Practices
- Put timeouts on all network-heavy steps.
- Use retries only where another attempt has a real chance of succeeding.
- Prefer exponential backoff for external systems.
- Keep total retry time aligned with the workflow's business deadline.