Skip to content

Advanced Metrics

Deep dive into monitoring Orchestrix workflows with professional metrics tools.

Key Metrics to Track

For any production workflow, you should track at least these four Golden Signals:

  1. Latency: Total flow duration and individual step durations.
  2. Traffic: Number of flow executions per second/minute.
  3. Errors: Failure rates for flows and steps.
  4. Saturation: If running many parallel steps, monitor your event loop or thread pool.

Implementation with Prometheus

ts
import { Counter, Histogram } from 'prom-client';

const flowLatency = new Histogram({
  name: 'orchestrix_flow_duration_seconds',
  help: 'Duration of flows in seconds',
  labelNames: ['flow_name', 'status']
});

const stepRetries = new Counter({
  name: 'orchestrix_step_retries_total',
  help: 'Total number of step retries',
  labelNames: ['flow_name', 'step_name']
});

const flow = create("my-flow", {
  hooks: {
    onFlowComplete: ({ result }) => {
      flowLatency.observe({ 
        flow_name: result.name, 
        status: result.status 
      }, result.durationMs / 1000);
    },
    onStepRetry: (event, attempt) => {
      stepRetries.inc({ 
        flow_name: event.flowName, 
        step_name: event.stepName 
      });
    }
  }
});

Dashboard Ideas

Workflow Health

  • Success Rate %: sum(rate(orchestrix_flow_duration_seconds_count{status="completed"}[5m])) / sum(rate(orchestrix_flow_duration_seconds_count[5m]))
  • P99 Latency: histogram_quantile(0.99, sum by (le) (rate(orchestrix_flow_duration_seconds_bucket[5m])))

Step Reliability

  • Top 5 Retrying Steps: Identify flaky dependencies.
  • Compensation Rate: How often do your flows fail and require undoing?

Business Metrics

Don't forget that workflows often represent business processes. You can use the FlowContext in your hooks to extract business-level metrics:

ts
onFlowSuccess: (result) => {
  const amount = result.context.get<number>('totalAmount');
  revenueCounter.inc(amount);
}

Released under the MIT License.