v0.1.2 — available now

Cut your Claude Code bill
by up to 70%

Pruner runs silently in the background while you use Claude Code, automatically reducing what you spend on every API call — without changing how Claude behaves.

$ curl -fsSL https://raw.githubusercontent.com/OneGoToAI/Pruner/main/install.sh | bash

Star on GitHub

macOS (Apple Silicon & Intel) · Linux x64 · requires Claude Code CLI

Terminal

~ pruner

────────────────────────────────────────────────────

Pruner proxy → 127.0.0.1:7777 optimizing your Claude requests [verified ✓]

────────────────────────────────────────────────────

Claude is thinking...

────────────────────────────────────────────────────

Pruner #1 28,430→17,102 tok✓ -39.8% $0.034 │ ⚡ 51,240 cached✓ $0.138 │ Σ $0.172

────────────────────────────────────────────────────

Pruner Session Report ✓ verified

────────────────────────────────────────────────────

Requests12

Original tokens128,432

After pruning31,204

Pruning saved75.7% $0.29

Cache hit tokens48,570 ✓

Cache saved$0.13

────────────────────────────────────────────────────

Total saved $0.42

────────────────────────────────────────────────────

▋

Works in 30 seconds

No config files. No API keys to manage. No code changes.

Install

$ curl -fsSL .../install.sh | bash

One command. No Node.js or npm required. Self-contained binary under 20 MB.

Replace `claude`

claude

pruner

All Claude flags work identically. --resume, -p, everything.

Watch savings

Pruner #4 -39% $0.034

After each Claude response, Pruner prints exactly how much it saved — verified by Anthropic's own tokenizer.

What Pruner does automatically

Three optimization strategies applied in real-time, before each request reaches Anthropic.

Context Pruning

Long conversations balloon with redundant history. Pruner trims old messages and oversized tool outputs while preserving the context Claude actually needs.

maxMessages · maxToolOutputChars

Prompt Cache Injection

Anthropic's prompt cache cuts repeated input costs by 90%. Pruner automatically injects cache_control on large system prompts so you get cache hits without any code changes.

cache_read: $0.30/M vs input: $3.00/M

Verified Savings

Savings figures use Anthropic's own count_tokens API and actual usage.input_tokens from each response — not estimates. What Pruner shows matches your bill.

✓ verified · ~estimated (fallback)

security first

Your code never leaves your machine

Pruner is a local-only proxy. Your prompts, API key, and codebase flow exactly one place: directly to api.anthropic.com — the same destination as without Pruner.

Binds only to localhost

The proxy listens exclusively on 127.0.0.1. It is not accessible from your local network, your router, or the internet.

Only talks to Anthropic

Zero telemetry. Zero analytics. No Pruner backend exists. Every outbound byte goes to api.anthropic.com:443 — nothing else.

API key never stored

Your Anthropic API key is forwarded in-memory, transparently — identical to how Claude CLI handles it. Pruner never writes it to disk or logs it.

Open source & auditable

Every line of code is on GitHub under the MIT license. Read it, audit it, or compile the binary yourself — the output is bit-for-bit identical.

Don't trust us — verify it yourself

Run pruner --debug to see a live log of every outbound connection. Or use your OS independently:

# Pruner's built-in audit log

pruner --debug

→ api.anthropic.com:443

✗ no other connections

# Independent OS verification

sudo lsof -i -n -P | grep pruner

pruner → api.anthropic.com:443

(only one remote address)

Commands

Every Claude flag works. A few extras.

CommandDescription

pruner Start Claude with cost optimization

pruner --resume Resume last session (all Claude flags pass through)

pruner --debug Show every outbound connection — verify Pruner only talks to Anthropic

pruner config Open ~/.pruner/config.json in your editor

pruner reset Reset session statistics

Install

No Node.js, no npm, no dependencies. Single binary.

🐚

curl (recommended)

Works on macOS and Linux. Detects your architecture automatically.

$ curl -fsSL https://raw.githubusercontent.com/OneGoToAI/Pruner/main/install.sh | bash

🍺

Homebrew

macOS only. Easier to update later with brew upgrade pruner.

$ brew install OneGoToAI/tap/pruner

Requirements

macOS (Apple Silicon or Intel) or Linux x64
Claude Code CLI installed and logged in
That's it — no Node.js, no Python, no other dependencies

Configuration

Run pruner config to open ~/.pruner/config.json.

Changes take effect immediately — no restart required.

~/.pruner/config.json

{
  "proxyPort": 7777,
  "optimizer": {
    "enablePromptCache":    true,    // inject cache_control on large system prompts
    "enableContextPruning": true,    // trim old messages
    "enableTruncation":     true,    // cap oversized tool outputs
    "accurateTokenCounting": true,   // use Anthropic count_tokens API
    "maxMessages":         20,      // keep last N messages
    "maxToolOutputChars":  3000       // chars per tool result
  },
  "pricing": {
    "inputPerMillion":      3.0,
    "outputPerMillion":     15.0,
    "cacheWritePerMillion": 3.75,
    "cacheReadPerMillion":  0.3
  }
}

Frequently asked

Does Pruner see my API key or code?

Pruner is a local-only proxy — it only listens on 127.0.0.1 and only connects to api.anthropic.com. Your API key is forwarded transparently and never stored.

You can verify this yourself by running pruner --debug, which prints every outbound connection, or by inspecting with:

sudo lsof -i -n -P | grep pruner

Will it change Claude's behavior or break my workflow?

Claude's responses are never touched — Pruner only modifies what you send to Anthropic, not what comes back.

If Claude's context window feels different after aggressive pruning, you can tune maxMessages up in the config, or disable context pruning entirely while keeping prompt cache injection active.

How accurate are the savings figures?

Numbers marked ✓ verified come directly from Anthropic:

Before token count — from Anthropic's /v1/messages/count_tokens API, called in parallel (zero latency impact)
After token count — from usage.input_tokens in every API response
Cache savings — from cache_read_input_tokens in every API response

If the count_tokens call fails (network timeout etc.), Pruner falls back to a tiktoken estimate and marks it ~estimated.

Does Pruner add latency to my requests?

Practically zero. The proxy overhead is <1ms. The count_tokens API call runs in parallel with the main request — Claude's generation (typically 3–30 seconds) takes far longer than the token count call.

Cut your Claude Code bill by up to 70%

Works in 30 seconds

Install

Replace claude

Watch savings

What Pruner does automatically

Context Pruning

Prompt Cache Injection

Verified Savings

Your code never leaves your machine

Binds only to localhost

Only talks to Anthropic

API key never stored

Open source & auditable

Don't trust us — verify it yourself

Commands

Install

curl (recommended)

Homebrew

Requirements

Configuration

Frequently asked

Start saving today

Cut your Claude Code bill
by up to 70%

Replace `claude`