MCP Plugin · Open Source · MIT
🛒 Submitted to Claude Marketplace · Pending Approval

Any AI client. Any model. Watches every token.

Stop burning tokens
on output
nobody reads.

token-saver monitors AI model outputs and fires warnings, errors, and alerts when usage is wasteful. Works with Claude Code, Cursor, Windsurf, Zed, Continue.dev — any MCP-compatible client, any model. Logs, verbose history, and repetitive noise — suppressed automatically to keep your context lean.

check_output response
{
  "alertLevel": "warning",
  "tokens": 1842,
  "outputType": "log",
  "shouldSuppress": true,
  "reason": "Output matches log/noise patterns",
  "detectedPatterns": [...]
}
4 alert
levels
4 alert levels
6 MCP tools
0 config required
MIT open source

How token-saver works

Every AI model output passes through token-saver's analyzer — regardless of which model or client produced it. It estimates token count, detects noise patterns (logs, stack traces, verbose history), and fires an alert if usage exceeds your thresholds. Suppressed outputs are never sent back into the context window.

⚠️

Threshold Alerts

Set warning (1000), error (5000), and alert (10000) token thresholds. Any output that exceeds a threshold fires the appropriate level immediately.

configurable per session
🔕

Log Suppression

Detects log-pattern outputs — INFO, DEBUG, TRACE, timestamps, stack traces, ANSI escape codes — and marks them for suppression automatically.

suppressLogs: true (default)
🔁

Repetitive History Detection

Scans your conversation history for near-duplicate messages and large assistant outputs the user likely ignored. Reports estimated token savings from truncation.

analyze_history tool
📊

Session Statistics

Accumulates total tokens analyzed, suppressed, and saved across every turn. Get a full breakdown at any point with get_session_stats.

always on

MCP Tools

Six focused tools. Each does exactly one thing.

Tool Description Alert level
check_output Analyze a text output. Returns alert level, token count, suppression flag, and detected patterns. info / warning / error / alert
get_session_stats Return cumulative session statistics: tokens analyzed, suppressed, saved, and alert counts.
reset_session_stats Reset all session statistics to zero.
analyze_history Scan a messages array for near-duplicates and ignored log outputs. Returns suggested truncation and savings estimate. info / warning / error / alert
set_thresholds Override warning / error / alert token thresholds and suppression flags for the current session.
set_mode Switch between off (default, silent), monitor (analyze only), and active (full suppression). Start here.
Alert levels: info (normal) · warning (>1000 tokens) · error (>5000 tokens) · alert (>10000 tokens or repetitive ignored output)  ·  Default mode: off — call set_mode to activate

Example usage

Real scenarios. Real inputs. Real responses. See exactly what you get.

🚦

Scenario 1 — How to enable the plugin

You don't call set_mode yourself. Claude does. After installing, just tell your AI client in plain English:

"Enable token-saver in active mode"

Claude reads the available MCP tools, understands what token-saver does, and calls set_mode on your behalf. That's it. You never touch JSON directly. Under the hood, the MCP call looks like this:

// Claude calls this automatically after you ask it to activate
{ "name": "set_mode", "arguments": { "mode": "active" } }
// response:
{ "mode": "active" }

There are three modes. Claude picks the right one based on what you ask for:

OFF (default)

Plugin is installed but completely silent. Nothing is analyzed. Zero overhead. Good for when you just want it ready but not running.

MONITOR

Analyzes every output and reports waste — but never suppresses. Great for a first session to see what's burning tokens before committing to full suppression.

ACTIVE

Full mode. Log-pattern outputs get shouldSuppress: true. Claude acts on that signal and skips feeding the noise back into context on the next turn.

🔍

Scenario 2 — Your app spits out a wall of server logs

You run a command and Claude gets back a 90-line Node.js startup log full of [INFO] and [DEBUG] lines. Nobody asked for that. Here's what token-saver tells you depending on the active mode:

INPUT passed to check_output
[INFO]  2024-01-15T09:12:33Z  Server starting on port 3000
[DEBUG] 2024-01-15T09:12:33Z  Loading config from /etc/app/config.json
[INFO]  2024-01-15T09:12:33Z  Database connection pool initialized (size: 10)
[DEBUG] 2024-01-15T09:12:34Z  Route /api/health registered
[DEBUG] 2024-01-15T09:12:34Z  Route /api/users registered
... (85 more lines like this)
monitor mode response
{
  "alertLevel": "warning",
  "tokens": 342,
  "outputType": "log",
  "shouldSuppress": false,
  "reason": "Output matches log/noise patterns",
  "detectedPatterns": [
    { "pattern": "\\[INFO\\]",
      "matchCount": 47 },
    { "pattern": "\\[DEBUG\\]",
      "matchCount": 41 }
  ]
}

✅ Detected the waste, told you about it. Did not suppress — you're in observe mode.

active mode response
{
  "alertLevel": "warning",
  "tokens": 342,
  "outputType": "log",
  "shouldSuppress": true,
  "reason": "Output matches log/noise patterns
    and will be suppressed",
  "detectedPatterns": [
    { "pattern": "\\[INFO\\]",
      "matchCount": 47 },
    { "pattern": "\\[DEBUG\\]",
      "matchCount": 41 }
  ]
}

🔕 Same detection, but now shouldSuppress: true. Your client skips feeding those 342 tokens back into context. Every. Single. Turn.

🔁

Scenario 3 — Claude keeps repeating itself in history

You've been debugging for 30 turns. Claude has sent the same file contents 4 times. The same error message appears in 6 different messages. analyze_history finds all of it.

INPUT — your messages array
[
  { "role": "user",      "content": "fix the auth bug" },
  { "role": "assistant", "content": "<full file contents>" },
  { "role": "user",      "content": "fix the auth bug" },
  { "role": "assistant", "content": "<full file contents>" },
  { "role": "user",      "content": "fix the auth bug" },
  { "role": "assistant", "content": "<full file contents>" }
]
analyze_history response
{
  "totalMessages": 6,
  "totalTokens": 1840,
  "repetitiveMessages": [
    { "index": 2, "role": "user",
      "tokens": 4,
      "reason": "Near-duplicate of message 0" },
    { "index": 3, "role": "assistant",
      "tokens": 610,
      "reason": "Near-duplicate of message 1" },
    { "index": 4, "role": "user", "tokens": 4 },
    { "index": 5, "role": "assistant", "tokens": 610 }
  ],
  "estimatedTokenSavings": 1228,
  "suggestedTruncation": 4,
  "alertLevel": "alert"
}

👆 1 228 tokens saveable. 4 messages suggested for truncation. That's 4 × 610 tokens re-sent on every future turn for zero reason. The plugin found it in one call.

📊

Proof test — full 9-scenario verification run

Run python3 test_live.py locally. It starts the MCP server, exercises every mode and tool, and reports exactly what was saved. Actual output:

============================================================
TOKEN-SAVER PROOF TEST
============================================================

[1] mode=off → skipped=true          ✓ PASS  ← silent by default
[2] set_mode(monitor)                 ✓ PASS
[3] short text → level=info           ✓ PASS  ← normal output, no action
[4] 1125-token output → warning       ✓ PASS  ← threshold triggered
[5] log in monitor → patterns=3, suppress=false  ✓ PASS  ← detected, not suppressed
[6] set_mode(active)                  ✓ PASS
[7] log in active → suppressed 87 tokens  ✓ PASS  ← same log, now suppressed
[8] repetitive history → alert, 95 tokens saveable  ✓ PASS
[9] session stats → 201 tokens suppressed  ✓ PASS

============================================================
PROOF SUMMARY
============================================================
  Tokens suppressed : 201
  Warnings fired    : 2    Alerts fired : 1
  Turns analyzed    : 5

  Overall: ALL CHECKS PASSED ✓
============================================================

Install token-saver

Three ways to install. Works with Claude Code, Cursor, Windsurf, Zed, Continue.dev — any MCP-compatible client, any model.

OPTION A — npx
No install needed. Always pulls the latest version. Just add to your MCP config:
{
  "command": "npx",
  "args": ["-y", "token-saver-mcp"]
}
OPTION B — npm global
Install once, use everywhere:
npm install -g token-saver-mcp
Then in your MCP config:
{
  "command": "token-saver-mcp"
}
OPTION C — from GitHub
Install directly from the repo in 1 command:
npm install -g github:flightlesstux/token-saver
No build step needed — compiled output is already in the repo. Same MCP config as Option B.
1

Add to your MCP client settings

Works with Claude Code, Cursor, Windsurf, Zed, Continue.dev — any MCP-compatible client.

Example for Claude Code (~/.claude/settings.json):

{
  "mcpServers": {
    "token-saver-mcp": {
      "command": "npx",
      "args": ["-y", "token-saver-mcp"]
    }
  }
}
2

(Optional) Configure thresholds

Create .token-saver.json in your project root:

{
  "warningThresholdTokens": 1000,
  "errorThresholdTokens": 5000,
  "alertThresholdTokens": 10000,
  "suppressLogs": true,
  "suppressRepetitiveHistory": true,
  "inactivityTurnsBeforeAlert": 3
}

FAQ

Questions people actually ask. Answered once, here, forever. You're welcome. 🫡

I'm a lawyer / teacher / doctor / kid — do I need to understand any of this?

Absolutely not. You just need the answer, not the 47-page log Claude printed to explain how it got there. token-saver watches the behind-the-scenes machinery so you don't have to. Think of it as the bouncer at the door of your context window — if it's garbage, it doesn't get in. You get results, not receipts.

Bonus: less token waste = less GPU energy burned per query = slightly happier planet. 🌍 You're basically an eco-warrior now.

Wait — does this actually save money? How much?

Yes, real money. Here's the math: Anthropic charges per input token on every turn. If Claude prints a 2000-token stack trace you never read, you're paying for those 2000 tokens again every single turn they stay in context. token-saver catches this, flags it, and marks it for suppression.

In our proof test: 201 tokens suppressed in one short session with 5 turns. Scale that to a real 50-turn debugging session with verbose logs and you're saving thousands of tokens — that's real cents per session, real dollars per month at any meaningful usage volume. The plugin costs $0.00.

What is the difference between warning, error, and alert?

info — all good, carry on (<1 000 tokens, no noise patterns). Nothing to see here.

warning — getting spicy. Output exceeds 1 000 tokens or matches log/noise patterns. Worth a look.

error — this is wasteful. Over 5 000 tokens. Should be truncated before it re-enters context.

alert — red siren, full chaos. Over 10 000 tokens, or Claude is stuck in a loop repeating itself like your uncle at Christmas dinner. Immediate action recommended.

Does token-saver actually delete or block anything?

No. token-saver is a polite advisor, not a bodyguard. It sets shouldSuppress: true on noisy outputs and explains why — but it never intercepts, modifies, or blocks any API call. Your client decides what to do with that signal. We tell you the kitchen is on fire; it's still your kitchen.

Why is the default mode "off"? That seems useless.

It's intentional. Dropping a plugin that immediately starts flagging everything in your workflow is rude. Install it, verify it's there, then turn it on when you're ready with set_mode("monitor") to observe first, or set_mode("active") to go full suppression. You're in control. The plugin doesn't assume it knows your workflow better than you do.

What's the difference between monitor and active mode?

monitor — analysis on, suppression off. Great for understanding your token waste before committing to suppression. It tells you "hey, this looks like a log dump" without doing anything about it.

active — full mode. Analyzes AND sets shouldSuppress: true on matching outputs. Your client can act on that signal to skip feeding the noise back into context.

How does token counting work? Is it accurate?

It uses a fast heuristic: ~4 characters per token (the English/code average). It's not the Anthropic tokenizer — that would add a network call and latency, which would be ironic for a plugin designed to save resources. The heuristic is accurate enough to catch waste at scale. If you're 10% off on a 5 000-token log dump, you still know it's a 5 000-token log dump.

Which models and clients does it support?

All of them. token-saver has zero dependency on any AI provider or API — it analyzes plain text and doesn't care who produced it. Claude, GPT-4, Gemini, Mistral, Llama, whatever ships next month — model-agnostic by design.

On the client side: Claude Code, Cursor, Windsurf, Zed, Continue.dev, and any other tool that speaks the Model Context Protocol. If it supports MCP, token-saver works with it.

Can I add custom log patterns?

Yes. Add a logPatterns array to .token-saver.json with regex strings. They are merged with the built-in patterns. Your custom Java spring boot timestamp format that looks nothing like anything normal? Covered. Your proprietary monitoring system's output? Add a pattern. Done.

Does this send my data anywhere?

No. Everything runs locally, in memory, in your process. No telemetry, no analytics, no "anonymous usage data" that's actually not anonymous. When the MCP server stops, the session stats evaporate. See the Privacy Policy for the full (very short) story.

I care about the environment. Does this actually help?

Every token you don't send is a tiny bit of GPU compute you don't burn. LLM inference consumes real energy — data centers, cooling, the works. By suppressing 2 000-token log dumps that re-enter context on every turn, you're not just saving money: you're shaving off compute that would have run on a server somewhere. It's not going to save the ice caps by itself, but hey — every watt counts. Use token-saver. Hug a tree. Both. 🌱

Is this free? What's the catch?

MIT license. Free forever. No SaaS, no subscription, no "free tier with 1 000 checks/month." The catch is that it's open source, so if it breaks you get to help fix it. That's the deal. Fair? Fair.

Open source · MIT · Zero lock-in

Ready to stop burning tokens
on output nobody reads?