Intent-compressed
intelligence orchestration.

A maestro for any LLM. O(N²) → O(N) by math.

Pick any model as the Maestro — Claude, GPT, Gemini, a local Ollama. It orchestrates Workers from any vendor and shares a cached system prompt. 88% cheaper at turn 10, not by trick — by arithmetic on the published pricing pages.

Calibrated against claude-opus-4-7 — verifiable
 Turns   Standalone   Burnless   Savings
   2        $0.80       $0.14    82.7%
   5        $2.06       $0.29    86.1%
  10        $4.34       $0.54    87.6%
  20        $9.59       $1.07    88.9%
  50       $30.72       $2.83    90.8%
A=O(N²) standalone. C=O(N) Burnless. Math, not heuristic.
MIT licensed · Self-hosted · No backend required · pip install burnless

Why the curve is quadratic.

Every turn in a standalone agent loop replays the full conversation as input. Cost on turn N is proportional to N, so total cost across N turns is Θ(N²). That is arithmetic from the pricing page, not a property of any SDK.

1. Capsules, not transcripts

Brain history holds ~80-char summaries of each turn, not the raw exchange. Full output stays on disk, read on demand.

2. Shared prefix cache

System prompt is byte-identical every turn with cache_control. Read price ($0.15/MTok) instead of write price ($15/MTok). 100× spread.

3. Tiers are roles, not models

Any model as Brain. Any model as Worker. GPT-4o, Opus, Sonnet, Codex, Ollama — one-line config change.

4. Three compression layers

Deterministic minifier (zero cost) → semantic encoder (Haiku, $0.001/turn) → optional LLMLingua-2 (CPU, no API).

We don't ask you to trust the table. Reproduce it: python bench/run.py --turns 8 with your own API key.

How it looks.

A small CLI. A folder of compact state under .burnless/. No hosted backend.

# in any project
$ pip install burnless
$ burnless init
$ burnless brain
   # interactive Brain — Sonnet by default, escalate to Opus on hard tasks

$ burnless delegate "summarize the failing tests"
   → d001 routed to bronze/haiku  (matched: summarize)
$ burnless run d001
   OK:d001 — cache_read=23,000 tokens (warm)

All state lives under .burnless/. Brain history, exec_log, capsules — local files you can grep.

Editions.

Run it locally for free. Or join the waitlist for hosted features that need a backend.

Burnless

Free · MIT

  • Full CLI, runs locally
  • Brain + Worker + capsule history
  • Shared prefix cache, all 3 compression layers
  • Provider-agnostic — any LLM as Brain or Worker
  • Reproducible benchmark
View on GitHub

Burnless Cloud

Soon · waitlist

  • Shared cache across machines and teammates
  • Hosted dashboard for token telemetry
  • Centralized audit log of every delegation
  • SSO + team permissions
  • Priority support
Join Cloud waitlist

Don't trust the table. python bench/run.py --turns 8 with your own API key.

Burnless Cloud — waitlist.

Hosted features for teams: shared cache, dashboards, audit logs, SSO. The CLI is free and open source — Cloud is for orgs that want a backend.