Pruner runs silently in the background while you use Claude Code, automatically reducing what you spend on every API call — without changing how Claude behaves.
macOS (Apple Silicon & Intel) · Linux x64 · requires Claude Code CLI
No config files. No API keys to manage. No code changes.
One command. No Node.js or npm required. Self-contained binary under 20 MB.
claudeAll Claude flags work identically. --resume, -p, everything.
After each Claude response, Pruner prints exactly how much it saved — verified by Anthropic's own tokenizer.
Three optimization strategies applied in real-time, before each request reaches Anthropic.
Long conversations balloon with redundant history. Pruner trims old messages and oversized tool outputs while preserving the context Claude actually needs.
Anthropic's prompt cache cuts repeated input costs by 90%. Pruner automatically
injects cache_control on large system prompts so you get cache hits without any code changes.
Savings figures use Anthropic's own count_tokens API and
actual usage.input_tokens from each response —
not estimates. What Pruner shows matches your bill.
Pruner is a local-only proxy. Your prompts, API key, and codebase flow exactly one place: directly to api.anthropic.com — the same destination as without Pruner.
The proxy listens exclusively on 127.0.0.1. It is not accessible from your local network, your router, or the internet.
Zero telemetry. Zero analytics. No Pruner backend exists. Every outbound byte goes to api.anthropic.com:443 — nothing else.
Your Anthropic API key is forwarded in-memory, transparently — identical to how Claude CLI handles it. Pruner never writes it to disk or logs it.
Every line of code is on GitHub under the MIT license. Read it, audit it, or compile the binary yourself — the output is bit-for-bit identical.
Run pruner --debug to see a live log of every outbound connection. Or use your OS independently:
Every Claude flag works. A few extras.
pruner
Start Claude with cost optimization
pruner --resume
Resume last session (all Claude flags pass through)
pruner --debug
Show every outbound connection — verify Pruner only talks to Anthropic
pruner config
Open ~/.pruner/config.json in your editor
pruner reset
Reset session statistics
No Node.js, no npm, no dependencies. Single binary.
Works on macOS and Linux. Detects your architecture automatically.
macOS only. Easier to update later with brew upgrade pruner.
Run pruner config to open ~/.pruner/config.json.
Changes take effect immediately — no restart required.
{
"proxyPort": 7777,
"optimizer": {
"enablePromptCache": true, // inject cache_control on large system prompts
"enableContextPruning": true, // trim old messages
"enableTruncation": true, // cap oversized tool outputs
"accurateTokenCounting": true, // use Anthropic count_tokens API
"maxMessages": 20, // keep last N messages
"maxToolOutputChars": 3000 // chars per tool result
},
"pricing": {
"inputPerMillion": 3.0,
"outputPerMillion": 15.0,
"cacheWritePerMillion": 3.75,
"cacheReadPerMillion": 0.3
}
}
Pruner is a local-only proxy — it only listens on 127.0.0.1 and only connects to api.anthropic.com. Your API key is forwarded transparently and never stored.
You can verify this yourself by running pruner --debug, which prints every outbound connection, or by inspecting with:
Claude's responses are never touched — Pruner only modifies what you send to Anthropic, not what comes back.
If Claude's context window feels different after aggressive pruning, you can tune maxMessages up in the config, or disable context pruning entirely while keeping prompt cache injection active.
Numbers marked ✓ verified come directly from Anthropic:
/v1/messages/count_tokens API, called in parallel (zero latency impact)usage.input_tokens in every API responsecache_read_input_tokens in every API responseIf the count_tokens call fails (network timeout etc.), Pruner falls back to a tiktoken estimate and marks it ~estimated.
Practically zero. The proxy overhead is <1ms. The count_tokens API call runs in parallel with the main request — Claude's generation (typically 3–30 seconds) takes far longer than the token count call.
One command. Zero config. Real savings.
Find it useful? A ⭐ helps others discover Pruner.
Star on GitHub