Intent-compressed
intelligence orchestration.
A maestro for any LLM. O(N²) → O(N) by math.
Pick any model as the Maestro — Claude, GPT, Gemini, a local Ollama. It orchestrates Workers from any vendor and shares a cached system prompt. 88% cheaper at turn 10, not by trick — by arithmetic on the published pricing pages.
Turns Standalone Burnless Savings 2 $0.80 $0.14 82.7% 5 $2.06 $0.29 86.1% 10 $4.34 $0.54 87.6% 20 $9.59 $1.07 88.9% 50 $30.72 $2.83 90.8%
pip install burnlessWhy the curve is quadratic.
Every turn in a standalone agent loop replays the full conversation as input. Cost on turn N is proportional to N, so total cost across N turns is Θ(N²). That is arithmetic from the pricing page, not a property of any SDK.
1. Capsules, not transcripts
Brain history holds ~80-char summaries of each turn, not the raw exchange. Full output stays on disk, read on demand.
2. Shared prefix cache
System prompt is byte-identical every turn with cache_control. Read price ($0.15/MTok) instead of write price ($15/MTok). 100× spread.
3. Tiers are roles, not models
Any model as Brain. Any model as Worker. GPT-4o, Opus, Sonnet, Codex, Ollama — one-line config change.
4. Three compression layers
Deterministic minifier (zero cost) → semantic encoder (Haiku, $0.001/turn) → optional LLMLingua-2 (CPU, no API).
We don't ask you to trust the table. Reproduce it: python bench/run.py --turns 8 with your own API key.
How it looks.
A small CLI. A folder of compact state under .burnless/. No hosted backend.
# in any project $ pip install burnless $ burnless init $ burnless brain # interactive Brain — Sonnet by default, escalate to Opus on hard tasks $ burnless delegate "summarize the failing tests" → d001 routed to bronze/haiku (matched: summarize) $ burnless run d001 OK:d001 — cache_read=23,000 tokens (warm)
All state lives under .burnless/. Brain history, exec_log, capsules — local files you can grep.
Editions.
Run it locally for free. Or join the waitlist for hosted features that need a backend.
Burnless
Free · MIT
- Full CLI, runs locally
- Brain + Worker + capsule history
- Shared prefix cache, all 3 compression layers
- Provider-agnostic — any LLM as Brain or Worker
- Reproducible benchmark
Burnless Cloud
Soon · waitlist
- Shared cache across machines and teammates
- Hosted dashboard for token telemetry
- Centralized audit log of every delegation
- SSO + team permissions
- Priority support
Don't trust the table. python bench/run.py --turns 8 with your own API key.
Burnless Cloud — waitlist.
Hosted features for teams: shared cache, dashboards, audit logs, SSO. The CLI is free and open source — Cloud is for orgs that want a backend.