Retries
Retries are how Orchestrix handles transient failures automatically.
In real systems, many failures are temporary: network hiccups, overloaded downstream services, short-lived database contention, or third-party API instability. Retrying these operations can turn a fragile flow into a resilient one.
When retries help
Retries are a good fit for:
- HTTP calls to external services
- queue or broker interactions
- short-lived infrastructure issues
- database operations that may fail transiently
Retries are a poor fit for:
- validation failures
- business rule violations
- malformed input
- permanent authorization errors
Basic usage
You can enable retries for any step:
flow.step("fetch-data", async () => {
return await api.getData();
}, {
retries: 3
});If the step fails, Orchestrix attempts to run it again according to the configured policy.
Retry configuration
flow.step("call-api", async () => {
// ...
}, {
retries: 5,
retryDelayMs: 1000,
backoffFactor: "exponential",
jitter: true,
maxRetryDelayMs: 10000
});Backoff strategies
Orchestrix supports three delay strategies:
Fixed
Uses the same delay for each retry.
{
retries: 3,
retryDelayMs: 1000,
backoffFactor: "fixed"
}Linear
Increases delay linearly with each retry.
{
retries: 3,
retryDelayMs: 1000,
backoffFactor: "linear"
}Exponential
Increases delay more aggressively to reduce pressure on struggling systems.
{
retries: 5,
retryDelayMs: 1000,
backoffFactor: "exponential"
}Jitter
Jitter adds randomness to retry delays so many clients do not retry at the exact same time.
This is especially useful in production environments where multiple workers or servers may fail together and recover together.
Timeouts and retries
Retries become much more useful when paired with timeouts.
Without a timeout, a step can hang for a long time before a retry even becomes relevant. With a timeout, each attempt has a clear upper bound.
What retries do not change
Retries do not make permanent errors disappear.
If the underlying problem is not transient, a retry policy only adds latency before failure. This is why retry decisions should be based on the kind of failure the step is expected to see.
Best Practices
- Retry transient infrastructure failures, not business failures.
- Use exponential backoff for remote dependencies.
- Add jitter in distributed or high-throughput systems.
- Pair retries with
timeoutMson network-heavy steps. - Keep the total retry window aligned with your product and operational expectations.