Olympus Architecture
Olympus is structured around a small set of focused components. Each one has a single responsibility — routing, compression, provider dispatch, or UI — and communicates through well-defined interfaces. The diagrams below show how a query moves through the system from keystroke to response.
System components
The names follow Greek mythology. Zeus is the master orchestrator; Hermes is the router; the shell is the TUI. Provider plugins sit outside the core and communicate through a common interface.
4-level provider waterfall
Every query enters at Ollama. If Ollama is running, it handles the request for free at <200ms. If unavailable, the query falls through to cloud providers — subscription providers first, pay-per-token last. This mirrors the routing strategy described in RouteLLM (Berkeley/Lmsys, 2024).
The local_threshold config key controls how aggressive the local bias is.
At 1.0 (default) every query goes to Ollama when available. Lower values
let higher-complexity queries skip straight to cloud.
Context compression before cloud calls
Without compression, every cloud API call re-sends the entire conversation history. By turn 20 you are paying for 20x the tokens you actually need. Olympus intercepts this with a local Ollama summarisation step that runs before any cloud call.
This technique is described in detail in LLMLingua-2 (Microsoft Research, 2024) and in Anthropic's own long context management guide.
| Scenario | Without compression | With compression | Reduction |
|---|---|---|---|
| 10-turn session | ~4,000 tokens/call | ~1,200 tokens/call | 70% |
| 20-turn session | ~8,000 tokens/call | ~1,200 tokens/call | 85% |
| 40-turn session | ~16,000 tokens/call | ~1,400 tokens/call | 91% |
Compression is automatic. Configure the trigger point:
routing:
compress_after_turns: 10 # summarise history every 10 turns (default)
End-to-end query lifecycle
From keypress to rendered response, a query passes through five stages. The critical insight is that Ollama participates in two stages — as the primary answer provider and as the compression engine for cloud calls.
Provider interface and plugin system
Every provider — built-in or plugin — implements the same Go interface. This means adding a new provider requires zero changes to the core routing logic.
OpenAICompatProvider covers Groq, Mistral, Azure OpenAI, Together AI, and any OpenAI-compatible API.
Plugin providers can be added two ways:
# CLI — writes to ~/.olympus/config.yaml automatically
olympus providers add groq \
--key gsk_... \
--model llama-3.3-70b-versatile \
--base-url https://api.groq.com/openai/v1
# ~/.olympus/config.yaml
plugins:
groq:
api_key: "gsk_..."
model: "llama-3.3-70b-versatile"
base_url: "https://api.groq.com/openai/v1"
For providers requiring custom auth or streaming, implement providers.Provider
directly and register via init(). See the
plugin guide
in the README.
Shell rendering pipeline
The TUI is built with Bubble Tea (Elm architecture) and renders AI responses as full markdown using Glamour. Streaming and final rendering are handled differently to avoid partial-markdown artefacts.
The renderer is recreated on every WindowSizeMsg so word-wrap tracks
terminal width. A 6-column margin is subtracted to prevent Glamour from filling
the full terminal width.
Mythology naming convention
| Name | Role | Status |
|---|---|---|
| Zeus | Master orchestrator — routes, compresses, tracks costs | Live |
| Hermes | Capability-aware provider router | Live |
| Argus | Observability — token tracking, cost metrics | Live |
| Athena | Structured multi-step reasoning engine | Planned |
| Hephaestus | Code generation with diff output and auto-apply | Planned |
| Mnemosyne | Persistent cross-session memory | Planned |
| Aegis | Security guardrails and prompt sanitisation | Planned |