Intent-compressed
intelligence orchestration.

A maestro for any LLM. O(N²) → O(N) by math.

Pick any model as the Maestro — Claude, GPT, Gemini, a local Ollama. It orchestrates Workers from any vendor and shares a cached system prompt. 88% cheaper at turn 10, not by trick — by arithmetic on the published pricing pages.

We compress pages of conversation history into a single 80-character line. Your AI keeps the memory. You stop paying for the excess.

Standalone O(N²) vs Burnless O(N) cost curve

Calibrated from real Anthropic API runs. Reproduce: python bench/v2.py --simulate

Calibrated against claude-opus-4-7 — verifiable
 Turns   Standalone   Burnless   Savings
   2        $0.80       $0.14    82.7%
   5        $2.06       $0.29    86.1%
  10        $4.34       $0.54    87.6%
  20        $9.59       $1.07    88.9%
  50       $30.72       $2.83    90.8%
A=O(N²) standalone. C=O(N) Burnless. Math, not heuristic.
Real case · Customer support agent · 50 messages exchanged
 Without Burnless   $2.45
 With Burnless      $0.28
 Real saving        88%
Same conversation. Same model. Different bill.
MIT licensed · Self-hosted · No backend required · pip install burnless

Beyond cost. The protocol layer.

A token is not an abstraction. It is compute. Compute is electricity. Electricity is water and infrastructure. O(N²) at the scale LLM inference is heading — 1–5% of global electricity within a decade — is not a pricing quirk. It is a trajectory.

Burnless is the transparent, provider-agnostic layer that sits between any user and any LLM, converting O(N²) context growth into O(N) by design. Not a feature. Not an SDK wrapper. A protocol — like TCP/IP was to packets. The model receives a capsule. The user sends a message. Neither knows the layer exists.

60–90 TWh/year

Estimated energy saved at 1% of global LLM inference. Denmark's entire electricity consumption is 35 TWh/year. This is not a rounding error.

Structurally unblockable

You cannot prohibit sending a summary of a conversation. That is indistinguishable from normal behavior. The protocol is invisible by design.

Inevitable. Open.

This layer will exist. The question is who defines it and in whose interest. Burnless answers: MIT, documented, first.

Read the founding vision →

Why the curve is quadratic.

Every turn in a standalone agent loop replays the full conversation as input. Cost on turn N is proportional to N, so total cost across N turns is Θ(N²). That is arithmetic from the pricing page, not a property of any SDK.

1. Capsules, not transcripts

Brain history holds ~80-char summaries of each turn, not the raw exchange. Full output stays on disk, read on demand.

2. Shared prefix cache

System prompt is byte-identical every turn with cache_control. Read price ($0.15/MTok) instead of write price ($15/MTok). 100× spread.

3. Tiers are roles, not models

Any model as Brain. Any model as Worker. GPT-4o, Opus, Sonnet, Codex, Ollama — one-line config change.

4. Three compression layers

Deterministic minifier (zero cost) → semantic encoder (Haiku, $0.001/turn) → optional LLMLingua-2 (CPU, no API).

5. Zero data retention

State lives in .burnless/ on your machine. No dashboard sees your prompts, your keys, or your conversations. Nothing leaves the box.

We don't ask you to trust the table. Reproduce it: python bench/run.py --turns 8 with your own API key.

How it looks.

A small CLI. A folder of compact state under .burnless/. No hosted backend.

# in any project
$ pip install burnless
$ burnless init
$ burnless brain
   # interactive Brain — Sonnet by default, escalate to Opus on hard tasks

$ burnless delegate "summarize the failing tests"
   → d001 routed to bronze/haiku  (matched: summarize)
$ burnless run d001
   OK:d001 — cache_read=23,000 tokens (warm)

All state lives under .burnless/. Brain history, exec_log, capsules — local files you can grep.

Editions.

Run it locally for free. Or join the waitlist for hosted features that need a backend.

Burnless

Free · MIT

  • Full CLI, runs locally
  • Brain + Worker + capsule history
  • Shared prefix cache, all 3 compression layers
  • Provider-agnostic — any LLM as Brain or Worker
  • Reproducible benchmark
View on GitHub

Burnless Cloud

Soon · waitlist

  • Shared cache across machines and teammates
  • Hosted dashboard for token telemetry
  • Centralized audit log of every delegation
  • SSO + team permissions
  • Priority support
Join Cloud waitlist

Don't trust the table. python bench/run.py --turns 8 with your own API key.

Burnless Cloud — waitlist.

Hosted features for teams: shared cache, dashboards, audit logs, SSO. The CLI is free and open source — Cloud is for orgs that want a backend.