Engineering Reference · orchestrator/statemachine · orchestrator/runner

StateMachine & DispatchGates

StateMachine tracks which phase the loop is in and enforces capacity-aware gate checks before every transition. DispatchGates validates dispatch coverage, TL delegation, and batch sizing at Phase 3. Both are fully implemented and tested. Neither is wired into the live loopdriver or Zeus.

Phase transitions and valid paths

StateMachine.Transition(target) does three things: validates the transition against the allowed-transitions map, runs a capacity gate check, and records the result in an audit log. It returns the gate action or an error.

stateDiagram-v2 [*] --> P0: New() P0 --> P1: fresh start (recoveryTarget == 0) P0 --> Px: crash resume (recoveryTarget == x) P1 --> P2 P2 --> P3 P3 --> P4 P4 --> P3: more issues P4 --> P5 P5 --> P1: next cycle P5 --> P2: re-plan P5 --> P6: planned P6 --> P7: planned P7 --> P1: next cycle

Phase 0 has a recovery-target restriction: on a fresh start it can only go to Phase 1. After a crash resume, SetRecoveryTarget(n) must be called before Transition() — Phase 0 will then only allow transitioning to that specific target phase. Any other target returns InvalidTransition.

Green / Orange / Red — how gates fire

Before allowing a transition, Transition() calls capacity.ClassifyTier(signals) to determine the current tier. The tier is derived from accumulated signal counters:

SignalMeaning
ToolCallsTotal tool invocations this session
TurnsLLM turns consumed
IssuesCompletedIssues resolved this cycle
ContextTokensUsedEstimated tokens in active context
ContextTokensMaxProvider context window limit
ParallelCodersActive parallel agent slots
ParallelTeamLeadsActive TL agent slots
flowchart LR S["signals"] --> CT["ClassifyTier()"] CT -->|within limits| G["Green\nnormal operation"] CT -->|approaching limits| O["Orange\nthrottle"] CT -->|at limits| R["Red\nreject / checkpoint"] G --> GA["GateAction: Continue"] O --> OA["GateAction: Throttle"] R --> RA["GateAction: Checkpoint\nor EmergencyStop"] style G fill:#1a2e20,color:#A3BE8C,stroke:#A3BE8C style O fill:#2e2a10,color:#EBCB8B,stroke:#EBCB8B style R fill:#3a2020,color:#BF616A,stroke:#BF616A

When the gate action is Checkpoint or EmergencyStop, Transition() returns a ShutdownRequired error. The caller is responsible for persisting state and halting. Today no caller handles this — the error would propagate unhandled.

Audit log of every transition

Every call to Transition() appends a GateRecord to the machine's internal history. This provides a complete audit trail of all phase transitions and the capacity state at the time of each one.

type GateRecord struct {
    Phase             int    // target phase
    Tier              string // "Green", "Orange", "Red"
    Action            string // "Continue", "Throttle", "Checkpoint", "EmergencyStop"
    ToolCalls         int
    Turns             int
    IssuesCompleted   int
    ContextTokensUsed int
    ContextTokensMax  int
}

The audit log is in-memory only — it is not persisted to disk. Connecting it to the audit.Logger package (which does persist) is part of the integration work.

Pre-dispatch validation

DispatchGates is a set of validators that should run at Phase 3 completion before the loop proceeds to Phase 4 (Collect). Three gates are implemented:

GateMethodWhat it checks
Coverage ValidateDispatchCoverage(issues) Every selected issue has at least one dispatch record. Returns an error listing uncovered issues.
TL Delegation ValidateTLDelegation(phase) In PM mode: every tech-lead dispatch record has delegated to the required subordinate personas per the topology policy.
Batch sizing ValidateBatchDispatch(batchSize) Batch dispatch records do not exceed the configured maximum batch size.

DispatchGates depends on four collaborators: OrchestratorConfig, dispatch.Tracker, topologypolicy.Policy, and audit.Logger. None of these are wired into the live loopdriver. The gates exist as correct, tested validators waiting for a caller.

What wiring StateMachine and DispatchGates into Zeus would look like

When Zeus absorbs the loop (issue #62), StateMachine should gate every phase transition and DispatchGates should run before Phase 4 begins. The call sites are clear:

sequenceDiagram participant Zeus as Zeus participant SM as StateMachine participant DG as DispatchGates participant Loop as loop phases Zeus->>SM: Transition(PhasePreflight) SM-->>Zeus: action=Continue, tier=Green Zeus->>Loop: runPreflight() Zeus->>SM: Transition(PhasePlanning) SM-->>Zeus: action=Continue Zeus->>Loop: runPlanning() Zeus->>SM: Transition(PhaseDispatch) SM-->>Zeus: action=Continue Zeus->>Loop: runDispatch() Zeus->>DG: ValidateDispatchCoverage(issues) DG-->>Zeus: ok Zeus->>DG: ValidateTLDelegation(phase) DG-->>Zeus: ok Zeus->>SM: Transition(PhaseCollect) SM-->>Zeus: action=Checkpoint (Orange tier) Zeus->>Zeus: save checkpoint, halt

The StateMachine's signal counters would be updated by Zeus after each phase: tool calls from loopdriver results, turns from Zeus's own history counter, context tokens from the active provider's reported usage.