Architecture

Olympus Architecture

Olympus is structured around a small set of focused components. Each one has a single responsibility — routing, compression, provider dispatch, or UI — and communicates through well-defined interfaces. The diagrams below show how a query moves through the system from keystroke to response.

Components

System components

The names follow Greek mythology. Zeus is the master orchestrator; Hermes is the router; the shell is the TUI. Provider plugins sit outside the core and communicate through a common interface.

flowchart TD subgraph TUI["Shell — BubbleTea TUI"] Input["Keyboard input"] VP["Viewport\nGlamour markdown renderer"] Cmds["Slash commands\n/fix /review /security /govern"] end subgraph Core["Core — Zeus"] Zeus["Zeus\nOrchestrator"] Hermes["Hermes\nProvider router"] Compress["Compression engine\nOllama summarisation"] History["Conversation history\nmessage slice"] Stats["Argus\nToken tracker / cost metrics"] end subgraph Providers["Provider layer"] Ollama["Ollama\nlocal · free"] ClaudePro["Claude Pro\nOAuth · subscription"] Copilot["GitHub Copilot\nsubscription"] ClaudeAPI["Claude API\nper-token"] Plugins["Plugin providers\nGroq · Mistral · Azure"] end Input --> Zeus Cmds --> Zeus Zeus --> History Zeus --> Compress Zeus --> Hermes Zeus --> Stats Hermes --> Ollama Hermes --> ClaudePro Hermes --> Copilot Hermes --> ClaudeAPI Hermes --> Plugins Ollama -->|response| Zeus ClaudePro -->|response| Zeus ClaudeAPI -->|response| Zeus Zeus -->|token stream| VP style TUI fill:#1e2430,color:#D8DEE9,stroke:#4C566A style Core fill:#1a2535,color:#D8DEE9,stroke:#5E81AC style Providers fill:#1a2e20,color:#D8DEE9,stroke:#A3BE8C style Ollama fill:#2E3440,color:#A3BE8C,stroke:#A3BE8C style ClaudePro fill:#2E3440,color:#88C0D0,stroke:#88C0D0 style Copilot fill:#2E3440,color:#81A1C1,stroke:#81A1C1 style ClaudeAPI fill:#3a2020,color:#BF616A,stroke:#BF616A style Plugins fill:#2E3440,color:#EBCB8B,stroke:#EBCB8B

Routing

4-level provider waterfall

Every query enters at Ollama. If Ollama is running, it handles the request for free at <200ms. If unavailable, the query falls through to cloud providers — subscription providers first, pay-per-token last. This mirrors the routing strategy described in RouteLLM (Berkeley/Lmsys, 2024).

flowchart TD Q([User query]) --> Score Score["Complexity scorer\nheuristic · pure Go · 0ms"] Score -->|score within local threshold| O Score -->|score exceeds threshold| CP O["1 · Ollama local\nllama3 / mistral / phi3"] O -->|available| OR([Response · free · less than 200ms]) O -->|unavailable| CP CP["2 · Claude Pro OAuth\nsubscription · no per-token cost"] CP -->|available| CPR([Response · subscription]) CP -->|unavailable| GH GH["3 · GitHub Copilot\nsubscription · no per-token cost"] GH -->|available| GHR([Response · subscription]) GH -->|unavailable| CA CA["4 · Claude API\npay-per-token · last resort"] CA -->|available| CAR([Response · cost warning shown]) CA -->|unavailable| ERR([Error + diagnosis]) style Score fill:#252d3a,color:#EBCB8B,stroke:#EBCB8B style O fill:#2E3440,color:#A3BE8C,stroke:#A3BE8C style OR fill:#1a2e20,color:#A3BE8C,stroke:#A3BE8C style CP fill:#2E3440,color:#88C0D0,stroke:#88C0D0 style CPR fill:#1a2535,color:#88C0D0,stroke:#88C0D0 style GH fill:#2E3440,color:#81A1C1,stroke:#81A1C1 style GHR fill:#1a2535,color:#81A1C1,stroke:#81A1C1 style CA fill:#3a2020,color:#BF616A,stroke:#BF616A style CAR fill:#3a2020,color:#EBCB8B,stroke:#EBCB8B style ERR fill:#3a2020,color:#BF616A,stroke:#BF616A style Q fill:#252d3a,color:#D8DEE9,stroke:#4C566A

The local_threshold config key controls how aggressive the local bias is. At 1.0 (default) every query goes to Ollama when available. Lower values let higher-complexity queries skip straight to cloud.

Compression

Context compression before cloud calls

Without compression, every cloud API call re-sends the entire conversation history. By turn 20 you are paying for 20x the tokens you actually need. Olympus intercepts this with a local Ollama summarisation step that runs before any cloud call.

This technique is described in detail in LLMLingua-2 (Microsoft Research, 2024) and in Anthropic's own long context management guide.

flowchart LR subgraph Session["Active session — 20 turns"] T1["Turns 1 to 16\n~6400 tokens\nverbose back-and-forth"] T2["Turns 17 to 20\n~1600 tokens\nrecent context"] end subgraph Compress["Local compression — Ollama — free"] OL["Ollama summarises\nturns 1 to 16\n~1 second · zero cost"] SUM["Dense summary\n~800 tokens\ndecisions · code · errors"] end subgraph Payload["Cloud API payload"] PAY["Summary + turns 17 to 20\n~2400 tokens\n70% smaller"] end T1 -->|compress_after_turns: 10| OL OL --> SUM T2 --> PAY SUM --> PAY PAY -->|sent to Claude or Copilot| Cloud([Cloud response]) style Session fill:#1e2430,color:#D8DEE9,stroke:#4C566A style Compress fill:#1a2e20,color:#D8DEE9,stroke:#A3BE8C style Payload fill:#1a2535,color:#D8DEE9,stroke:#5E81AC style OL fill:#2E3440,color:#A3BE8C,stroke:#A3BE8C style SUM fill:#2E3440,color:#88C0D0,stroke:#88C0D0 style PAY fill:#2E3440,color:#81A1C1,stroke:#81A1C1 style Cloud fill:#1a2535,color:#88C0D0,stroke:#88C0D0

Scenario	Without compression	With compression	Reduction
10-turn session	~4,000 tokens/call	~1,200 tokens/call	70%
20-turn session	~8,000 tokens/call	~1,200 tokens/call	85%
40-turn session	~16,000 tokens/call	~1,400 tokens/call	91%

Compression is automatic. Configure the trigger point:

routing:
  compress_after_turns: 10   # summarise history every 10 turns (default)

Data Flow

End-to-end query lifecycle

From keypress to rendered response, a query passes through five stages. The critical insight is that Ollama participates in two stages — as the primary answer provider and as the compression engine for cloud calls.

sequenceDiagram actor User participant Shell as Shell TUI participant Zeus as Zeus orchestrator participant Hermes as Hermes router participant Ollama as Ollama local participant Cloud as Cloud provider User->>Shell: types query Shell->>Zeus: send query Zeus->>Zeus: append to history alt Ollama available Zeus->>Hermes: route query with history Hermes->>Ollama: generate messages Ollama-->>Shell: stream tokens Shell-->>User: render markdown else Ollama unavailable Zeus->>Ollama: compress history turns 1 to N-4 Ollama-->>Zeus: summary string Zeus->>Hermes: route query with compressed history Hermes->>Cloud: generate compressed messages Cloud-->>Shell: stream tokens Shell-->>User: render markdown with cost warning end

Provider System

Provider interface and plugin system

Every provider — built-in or plugin — implements the same Go interface. This means adding a new provider requires zero changes to the core routing logic.

classDiagram class Provider { <<interface>> +Name() string +IsAvailable() bool +MaxContext() int +CostPerToken() float64 +Capabilities() Capability list +Generate(ctx, req) Response } class OllamaProvider { host string model string } class ClaudeProProvider { oauthToken string } class CopilotProvider { githubPAT string } class ClaudeAPIProvider { apiKey string } class OpenAICompatProvider { name string baseURL string apiKey string model string } Provider <|.. OllamaProvider Provider <|.. ClaudeProProvider Provider <|.. CopilotProvider Provider <|.. ClaudeAPIProvider Provider <|.. OpenAICompatProvider

OpenAICompatProvider covers Groq, Mistral, Azure OpenAI, Together AI, and any OpenAI-compatible API.

Plugin providers can be added two ways:

# CLI — writes to ~/.olympus/config.yaml automatically
olympus providers add groq \
  --key gsk_... \
  --model llama-3.3-70b-versatile \
  --base-url https://api.groq.com/openai/v1

# ~/.olympus/config.yaml
plugins:
  groq:
    api_key: "gsk_..."
    model: "llama-3.3-70b-versatile"
    base_url: "https://api.groq.com/openai/v1"

For providers requiring custom auth or streaming, implement providers.Provider directly and register via init(). See the plugin guide in the README.

TUI

Shell rendering pipeline

The TUI is built with Bubble Tea (Elm architecture) and renders AI responses as full markdown using Glamour. Streaming and final rendering are handled differently to avoid partial-markdown artefacts.

flowchart LR subgraph Streaming["While streaming"] S1["Raw text tokens\narriving via channel"] S2["Displayed as plain text\nwith blinking cursor"] end subgraph Complete["On stream complete"] C1["Full response string"] C2["Glamour renderer\ndark style · terminal width"] C3["Rendered markdown\nheaders · code blocks · tables"] end S1 --> S2 S2 -->|stream done| C1 C1 --> C2 C2 --> C3 style Streaming fill:#1e2430,color:#D8DEE9,stroke:#4C566A style Complete fill:#1a2535,color:#D8DEE9,stroke:#5E81AC

The renderer is recreated on every WindowSizeMsg so word-wrap tracks terminal width. A 6-column margin is subtracted to prevent Glamour from filling the full terminal width.

Naming

Mythology naming convention

Name	Role	Status
Zeus	Master orchestrator — routes, compresses, tracks costs	Live
Hermes	Capability-aware provider router	Live
Argus	Observability — token tracking, cost metrics	Live
Athena	Structured multi-step reasoning engine	Planned
Hephaestus	Code generation with diff output and auto-apply	Planned
Mnemosyne	Persistent cross-session memory	Planned
Aegis	Security guardrails and prompt sanitisation	Planned