Feb 25, 2026

Save Money on AI Tokens

Thousands of developers spend money on AI tokens every day, and most of them don't even realize it. I found a ratio of 156:1 from a single Saturday of AI-assisted coding using LLM models on AWS Bedrock. For every token of code the AI generated, I fed it 156 tokens of context, conversation history, and codebase references. The numbers tell a story that the AI industry doesn't talk about much: we've been optimizing for the wrong thing.

The Receipt Says It All

Let me show you exactly what I paid for that Saturday of coding:

2026-02-21 (Sat)
Daily Total: 30,087,809 input + 191,881 output = 30,279,690 tokens
Cost: $18.62 across 793 API calls

The model (Moonshot AI's Kimi K2.5) generated just 191,881 output tokens, roughly 143,000 words of actual code and explanations. But to produce that, it had to ingest 30 million tokens of context. That's like hiring a consultant who charges you for every document they have to read before they can give you advice, and they have to re-read everything every single time they answer a question.

The Economics Don't Add Up

The AI industry prices output tokens 2 to 4 times higher than input tokens. The reasoning is that output tokens require autoregressive generation, producing one token at a time, while input tokens process in parallel.

But this pricing model assumes a roughly balanced ratio of input to output. My data tells a different story.

On Kimi K2.5 that day I was charged $0.60 per million input tokens and $3.00 per million output tokens, a 5x ratio. With 28.6M input and 180K output:

Input cost: 28.6M tokens × $0.60/1M = $17.13
Output cost: 180K tokens × $3.00/1M = $0.54
Total: $17.68

So input tokens still accounted for 97% of the cost, even with output priced five times higher per token. The industry's focus on expensive output tokens misses the real economic drain: the context explosion happening on the input side.

Why Coding Agents Are Context Gluttons

Recent research from JetBrains and academic papers reveal what's happening under the hood. AI coding agents operate in continuous loops: read file, reason about it, make changes, encounter errors, read more files, fix the errors, and repeat.

Each iteration adds to the trajectory, a growing record of every tool call, error message, and generated output. A study found that 99% of tokens consumed by coding agents are input tokens from these trajectories. The average agent accumulates 48,400 tokens per task, and for complex multi-turn problems, this balloons to over 1 million input tokens.

The problem isn't just volume, it is waste. Research on AgentDiet showed that agent trajectories contain 40 to 60% redundant, expired, or useless information that can be removed without hurting performance. Agents are like hoarders, stuffing their context window with every scrap of data just in case.

The Quadratic Trap

Token costs grow quadratically with each turn in an agent loop. If turn one costs you N tokens, turn two costs more than 2N because the agent now has N tokens of history to process along with the new request.

This creates a strange incentive structure. The longer your coding session, the more expensive each subsequent interaction becomes, not because the task is harder, but because the context baggage grows heavier.

My 793 API calls in a single day were not 793 independent operations. They were a continuous conversation where each call had to reprocess everything that came before.

What This Means for AI Coding Today

The industry is racing toward larger context windows, 1 million tokens, 2 million tokens, even 10 million tokens. But this is solving the wrong problem. A bigger bucket does not help if you are paying by the gallon for water you do not need.

The real opportunity lies in context management:

Prompt caching can reduce costs by 80 to 90% for repeated context prefixes
Trajectory compression removes redundant information without losing what matters
Tiered routing uses small models for context classification before invoking expensive models
Structured context workspaces keep relevant information accessible without full reprocessing

Quick Tips You Can Use Today

If you have been chatting with an agent in Cursor for a while, start a new agent to give your conversation a fresh start and drop the accumulated context.

If you are using Claude Code, type /compact to shrink the context window and save on tokens.

The Bottom Line

My $18.62 Saturday was not expensive because of what the AI generated. It was expensive because of everything I made it read to get there.

As AI coding assistants become more common, we are reaching an important point. The companies that do well will not be those with the biggest context windows. They will be those that figure out how to stop paying for context their agents do not actually need.

The hidden cost of AI coding is not the code. It is the conversation. Start paying attention to your input token usage today, and you will be surprised at how quickly the savings add up.

Save Money on AI Tokens

Save Money on AI Tokens

The Receipt Says It All

The Economics Don't Add Up

Why Coding Agents Are Context Gluttons

The Quadratic Trap

What This Means for AI Coding Today

Quick Tips You Can Use Today

The Bottom Line

Subscribe to our newsletter for updates