Any AI client. Any model. Watches every token.
Stop burning tokens
on output
nobody reads.
token-saver monitors AI model outputs and fires warnings, errors, and alerts when usage is wasteful. Works with Claude Code, Cursor, Windsurf, Zed, Continue.dev — any MCP-compatible client, any model. Logs, verbose history, and repetitive noise — suppressed automatically to keep your context lean.
{
"alertLevel": "warning",
"tokens": 1842,
"outputType": "log",
"shouldSuppress": true,
"reason": "Output matches log/noise patterns",
"detectedPatterns": [...]
}
levels
How token-saver works
Every AI model output passes through token-saver's analyzer — regardless of which model or client produced it. It estimates token count, detects noise patterns (logs, stack traces, verbose history), and fires an alert if usage exceeds your thresholds. Suppressed outputs are never sent back into the context window.
Threshold Alerts
Set warning (1000), error (5000), and alert (10000) token thresholds. Any output that exceeds a threshold fires the appropriate level immediately.
Log Suppression
Detects log-pattern outputs — INFO, DEBUG, TRACE, timestamps, stack traces, ANSI escape codes — and marks them for suppression automatically.
Repetitive History Detection
Scans your conversation history for near-duplicate messages and large assistant outputs the user likely ignored. Reports estimated token savings from truncation.
Session Statistics
Accumulates total tokens analyzed, suppressed, and saved across every turn. Get a full breakdown at any point with get_session_stats.
MCP Tools
Six focused tools. Each does exactly one thing.
| Tool | Description | Alert level |
|---|---|---|
check_output |
Analyze a text output. Returns alert level, token count, suppression flag, and detected patterns. | info / warning / error / alert |
get_session_stats |
Return cumulative session statistics: tokens analyzed, suppressed, saved, and alert counts. | — |
reset_session_stats |
Reset all session statistics to zero. | — |
analyze_history |
Scan a messages array for near-duplicates and ignored log outputs. Returns suggested truncation and savings estimate. | info / warning / error / alert |
set_thresholds |
Override warning / error / alert token thresholds and suppression flags for the current session. | — |
set_mode |
Switch between off (default, silent), monitor (analyze only), and active (full suppression). Start here. | — |
Example usage
Real scenarios. Real inputs. Real responses. See exactly what you get.
Scenario 1 — How to enable the plugin
You don't call set_mode yourself. Claude does. After installing, just tell your AI client in plain English:
Claude reads the available MCP tools, understands what token-saver does, and calls set_mode on your behalf. That's it. You never touch JSON directly. Under the hood, the MCP call looks like this:
// Claude calls this automatically after you ask it to activate
{ "name": "set_mode", "arguments": { "mode": "active" } }
// response:
{ "mode": "active" }
There are three modes. Claude picks the right one based on what you ask for:
Plugin is installed but completely silent. Nothing is analyzed. Zero overhead. Good for when you just want it ready but not running.
Analyzes every output and reports waste — but never suppresses. Great for a first session to see what's burning tokens before committing to full suppression.
Full mode. Log-pattern outputs get shouldSuppress: true. Claude acts on that signal and skips feeding the noise back into context on the next turn.
Scenario 2 — Your app spits out a wall of server logs
You run a command and Claude gets back a 90-line Node.js startup log full of [INFO] and [DEBUG] lines. Nobody asked for that. Here's what token-saver tells you depending on the active mode:
[INFO] 2024-01-15T09:12:33Z Server starting on port 3000
[DEBUG] 2024-01-15T09:12:33Z Loading config from /etc/app/config.json
[INFO] 2024-01-15T09:12:33Z Database connection pool initialized (size: 10)
[DEBUG] 2024-01-15T09:12:34Z Route /api/health registered
[DEBUG] 2024-01-15T09:12:34Z Route /api/users registered
... (85 more lines like this)
{
"alertLevel": "warning",
"tokens": 342,
"outputType": "log",
"shouldSuppress": false,
"reason": "Output matches log/noise patterns",
"detectedPatterns": [
{ "pattern": "\\[INFO\\]",
"matchCount": 47 },
{ "pattern": "\\[DEBUG\\]",
"matchCount": 41 }
]
}
✅ Detected the waste, told you about it. Did not suppress — you're in observe mode.
{
"alertLevel": "warning",
"tokens": 342,
"outputType": "log",
"shouldSuppress": true,
"reason": "Output matches log/noise patterns
and will be suppressed",
"detectedPatterns": [
{ "pattern": "\\[INFO\\]",
"matchCount": 47 },
{ "pattern": "\\[DEBUG\\]",
"matchCount": 41 }
]
}
🔕 Same detection, but now shouldSuppress: true. Your client skips feeding those 342 tokens back into context. Every. Single. Turn.
Scenario 3 — Claude keeps repeating itself in history
You've been debugging for 30 turns. Claude has sent the same file contents 4 times. The same error message appears in 6 different messages. analyze_history finds all of it.
[
{ "role": "user", "content": "fix the auth bug" },
{ "role": "assistant", "content": "<full file contents>" },
{ "role": "user", "content": "fix the auth bug" },
{ "role": "assistant", "content": "<full file contents>" },
{ "role": "user", "content": "fix the auth bug" },
{ "role": "assistant", "content": "<full file contents>" }
]
{
"totalMessages": 6,
"totalTokens": 1840,
"repetitiveMessages": [
{ "index": 2, "role": "user",
"tokens": 4,
"reason": "Near-duplicate of message 0" },
{ "index": 3, "role": "assistant",
"tokens": 610,
"reason": "Near-duplicate of message 1" },
{ "index": 4, "role": "user", "tokens": 4 },
{ "index": 5, "role": "assistant", "tokens": 610 }
],
"estimatedTokenSavings": 1228,
"suggestedTruncation": 4,
"alertLevel": "alert"
}
👆 1 228 tokens saveable. 4 messages suggested for truncation. That's 4 × 610 tokens re-sent on every future turn for zero reason. The plugin found it in one call.
Proof test — full 9-scenario verification run
Run python3 test_live.py locally. It starts the MCP server, exercises every mode and tool, and reports exactly what was saved. Actual output:
============================================================
TOKEN-SAVER PROOF TEST
============================================================
[1] mode=off → skipped=true ✓ PASS ← silent by default
[2] set_mode(monitor) ✓ PASS
[3] short text → level=info ✓ PASS ← normal output, no action
[4] 1125-token output → warning ✓ PASS ← threshold triggered
[5] log in monitor → patterns=3, suppress=false ✓ PASS ← detected, not suppressed
[6] set_mode(active) ✓ PASS
[7] log in active → suppressed 87 tokens ✓ PASS ← same log, now suppressed
[8] repetitive history → alert, 95 tokens saveable ✓ PASS
[9] session stats → 201 tokens suppressed ✓ PASS
============================================================
PROOF SUMMARY
============================================================
Tokens suppressed : 201
Warnings fired : 2 Alerts fired : 1
Turns analyzed : 5
Overall: ALL CHECKS PASSED ✓
============================================================
Install token-saver
Three ways to install. Works with Claude Code, Cursor, Windsurf, Zed, Continue.dev — any MCP-compatible client, any model.
{
"command": "npx",
"args": ["-y", "token-saver-mcp"]
}
npm install -g token-saver-mcp
{
"command": "token-saver-mcp"
}
npm install -g github:flightlesstux/token-saver
Add to your MCP client settings
Works with Claude Code, Cursor, Windsurf, Zed, Continue.dev — any MCP-compatible client.
Example for Claude Code (~/.claude/settings.json):
{
"mcpServers": {
"token-saver-mcp": {
"command": "npx",
"args": ["-y", "token-saver-mcp"]
}
}
}
(Optional) Configure thresholds
Create .token-saver.json in your project root:
{
"warningThresholdTokens": 1000,
"errorThresholdTokens": 5000,
"alertThresholdTokens": 10000,
"suppressLogs": true,
"suppressRepetitiveHistory": true,
"inactivityTurnsBeforeAlert": 3
}
FAQ
Questions people actually ask. Answered once, here, forever. You're welcome. 🫡
I'm a lawyer / teacher / doctor / kid — do I need to understand any of this?
Absolutely not. You just need the answer, not the 47-page log Claude printed to explain how it got there. token-saver watches the behind-the-scenes machinery so you don't have to. Think of it as the bouncer at the door of your context window — if it's garbage, it doesn't get in. You get results, not receipts.
Bonus: less token waste = less GPU energy burned per query = slightly happier planet. 🌍 You're basically an eco-warrior now.
Wait — does this actually save money? How much?
Yes, real money. Here's the math: Anthropic charges per input token on every turn. If Claude prints a 2000-token stack trace you never read, you're paying for those 2000 tokens again every single turn they stay in context. token-saver catches this, flags it, and marks it for suppression.
In our proof test: 201 tokens suppressed in one short session with 5 turns. Scale that to a real 50-turn debugging session with verbose logs and you're saving thousands of tokens — that's real cents per session, real dollars per month at any meaningful usage volume. The plugin costs $0.00.
What is the difference between warning, error, and alert?
info — all good, carry on (<1 000 tokens, no noise patterns). Nothing to see here.
warning — getting spicy. Output exceeds 1 000 tokens or matches log/noise patterns. Worth a look.
error — this is wasteful. Over 5 000 tokens. Should be truncated before it re-enters context.
alert — red siren, full chaos. Over 10 000 tokens, or Claude is stuck in a loop repeating itself like your uncle at Christmas dinner. Immediate action recommended.
Does token-saver actually delete or block anything?
No. token-saver is a polite advisor, not a bodyguard. It sets shouldSuppress: true on noisy outputs and explains why — but it never intercepts, modifies, or blocks any API call. Your client decides what to do with that signal. We tell you the kitchen is on fire; it's still your kitchen.
Why is the default mode "off"? That seems useless.
It's intentional. Dropping a plugin that immediately starts flagging everything in your workflow is rude. Install it, verify it's there, then turn it on when you're ready with set_mode("monitor") to observe first, or set_mode("active") to go full suppression. You're in control. The plugin doesn't assume it knows your workflow better than you do.
What's the difference between monitor and active mode?
monitor — analysis on, suppression off. Great for understanding your token waste before committing to suppression. It tells you "hey, this looks like a log dump" without doing anything about it.
active — full mode. Analyzes AND sets shouldSuppress: true on matching outputs. Your client can act on that signal to skip feeding the noise back into context.
How does token counting work? Is it accurate?
It uses a fast heuristic: ~4 characters per token (the English/code average). It's not the Anthropic tokenizer — that would add a network call and latency, which would be ironic for a plugin designed to save resources. The heuristic is accurate enough to catch waste at scale. If you're 10% off on a 5 000-token log dump, you still know it's a 5 000-token log dump.
Which models and clients does it support?
All of them. token-saver has zero dependency on any AI provider or API — it analyzes plain text and doesn't care who produced it. Claude, GPT-4, Gemini, Mistral, Llama, whatever ships next month — model-agnostic by design.
On the client side: Claude Code, Cursor, Windsurf, Zed, Continue.dev, and any other tool that speaks the Model Context Protocol. If it supports MCP, token-saver works with it.
Can I add custom log patterns?
Yes. Add a logPatterns array to .token-saver.json with regex strings. They are merged with the built-in patterns. Your custom Java spring boot timestamp format that looks nothing like anything normal? Covered. Your proprietary monitoring system's output? Add a pattern. Done.
Does this send my data anywhere?
No. Everything runs locally, in memory, in your process. No telemetry, no analytics, no "anonymous usage data" that's actually not anonymous. When the MCP server stops, the session stats evaporate. See the Privacy Policy for the full (very short) story.
I care about the environment. Does this actually help?
Every token you don't send is a tiny bit of GPU compute you don't burn. LLM inference consumes real energy — data centers, cooling, the works. By suppressing 2 000-token log dumps that re-enter context on every turn, you're not just saving money: you're shaving off compute that would have run on a server somewhere. It's not going to save the ice caps by itself, but hey — every watt counts. Use token-saver. Hug a tree. Both. 🌱
Is this free? What's the catch?
MIT license. Free forever. No SaaS, no subscription, no "free tier with 1 000 checks/month." The catch is that it's open source, so if it breaks you get to help fix it. That's the deal. Fair? Fair.
Open source · MIT · Zero lock-in