Engineering Reference · hermes/hermes.go

Hermes — Provider Router

Hermes is the capability-aware model router inside Zeus. Every call to Zeus.Generate() or Zeus.Stream() delegates provider selection to Hermes. This page covers the routing decision tree, complexity scoring, task-type overrides, and fallback behaviour.

How Hermes picks a provider

Hermes.Route(capability, prompt) works through a prioritised decision tree. Every node is a real conditional in the source — this is not aspirational documentation.

flowchart TD A([Route called]) --> B{offline_mode?} B -->|yes| OL[return Ollama] B -->|no| C{local_first AND\ncomplexity ≤ local_threshold?} C -->|yes + Ollama available| OL C -->|no| D{capability ==\nSummarization or Embeddings?} D -->|yes + Ollama available| OL D -->|no| E{task_routes configured\nAND prompt matches task type?} E -->|yes| TR[try primary route\nthen fallback route] TR -->|found| R([return provider]) TR -->|not found| F E -->|no| F[iterate cloud_priority order] F --> G{provider available\nAND has capability?} G -->|yes| R G -->|exhausted| H[any available provider\nwith capability] H -->|found| R H -->|none| NIL([return nil — error shown]) style OL fill:#1a2e20,color:#A3BE8C,stroke:#A3BE8C style R fill:#1a2535,color:#88C0D0,stroke:#88C0D0 style NIL fill:#3a2020,color:#BF616A,stroke:#BF616A

How local_threshold gates Ollama

When local_first: true (the default), Hermes calls compression.ClassifyComplexity(prompt) before attempting Ollama. The scorer is a pure-Go heuristic — no model call, zero latency.

Signals the scorer uses: token count estimate, presence of code fences, multi-step instruction keywords (explain, architect, design, compare, evaluate), and length relative to a complexity band. Returns a float in [0.0, 1.0].

local_threshold controls how aggressive the local bias is:

ValueBehaviour
1.0 (default)Everything goes to Ollama when available — cloud is pure fallback
0.7Simple queries go local; high-complexity queries skip to cloud
0.0Disable local-first — every request goes to cloud priority order
# ~/.olympus/config.yaml
routing:
  local_first: true
  local_threshold: 1.0

Task-type routing overrides

When task_routes is configured, Hermes classifies each prompt into a task type via keyword matching in ClassifyTaskType(), then looks up a named primary/fallback route.

Task typeTrigger keywords (sample)
code_generationimplement, refactor, fix bug, write a function, add endpoint
deep_reasoningexplain, analyze, architect, design a system, compare, evaluate
automation_pipelineautomate, pipeline, schedule, workflow, cron, bulk, batch process
batch_processingbatch processing, batch job, bulk run
large_reasoningconfigured route key only — no keyword classifier yet
# ~/.olympus/config.yaml
routing:
  task_routes:
    deep_reasoning:
      primary: claude_pro
      fallback: copilot
    code_generation:
      primary: ollama
      fallback: claude_pro

claude_pro and claude_api are aliases — both resolve to the claude provider internally. Which variant is active depends on whether an OAuth token or an API key is configured.

Summarization, Embeddings, and Fallback

Summarization and Embeddings capabilities always route to Ollama regardless of local_first or complexity settings. This is a hard-coded preference — these operations should never incur cloud token cost.

Hermes.Fallback(current) returns the next provider after current in the cloud priority order. Zeus uses this when a provider call fails mid-stream, allowing automatic retry on the next tier without restarting the request.

Offline mode (routing.offline_mode: true) bypasses the entire decision tree and returns Ollama directly. All other routing logic is skipped.

What Hermes requires from providers

Every provider — built-in or plugin — must implement the providers.Provider interface. Hermes calls four methods on every candidate before routing:

classDiagram class Provider { <<interface>> +Name() string +IsAvailable() bool +Capabilities() list +MaxContext() int +CostPerToken() float64 +Generate(ctx, req) Response } class Hermes { +Route(cap, prompt) Provider +Fallback(current) Provider +Providers() list +ProviderOrder() list +IsClaudeAPI(p) bool } Hermes --> Provider : selects

IsAvailable() is checked at call time, not cached. A provider that becomes unavailable mid-session (e.g. Ollama stops) is immediately bypassed to the next tier.