AI has given rise to a unit of measurement that most business leaders don't know about yet. And yet, it already represents significant budgets, it structures a multi-billion dollar market, and its price is evolving at an unprecedented pace in the history of technology. But it also introduces a new form of fragility — because predicting what you will actually consume is far from straightforward.
When you turn on the lights, you consume kilowatt-hours. When you store files in the cloud, you consume gigabytes. When you use an artificial intelligence — whether ChatGPT, Claude, Copilot, Gemini or another — you consume tokens.
The token is the basic unit of the AI economy. Every question asked, every answer generated, every document analyzed, every email rewritten by an AI is measured in tokens. And like the kWh or the GB before it, the token is becoming a cost line in its own right — one that most businesses aren't tracking yet.
What exactly is a token?
A token is neither a word nor a character. It's a fragment of text — typically a short word, a syllable, or a punctuation mark. In English, a common word corresponds to roughly 1 to 1.5 tokens. A ten-word sentence represents about a dozen. A standard A4 page of text, roughly 400 to 500 tokens.
To give a concrete sense of scale:
- Asking ChatGPT a simple question and receiving a short answer consumes roughly 500 to 1,000 tokens.
- Having a 10-page document summarized can consume 5,000 to 8,000 tokens.
- Having a contract analyzed, a commercial proposal drafted, or a structured report produced can exceed 20,000 to 50,000 tokens per operation.
These numbers seem abstract — until you multiply them by the number of employees and the number of operations per day. That's when the token stops being a technical concept and becomes a budget line item.
A market already worth billions
The token is not an engineering detail. It's the core of AI's business model.
All major providers charge per use, per token. OpenAI, Anthropic, Google, xAI — each has its pricing grid, expressed in dollars per million tokens. And the volumes involved are staggering.
According to Precedence Research, global spending on generative AI reached $37 billion in 2025 — 3.2 times more than in 2024. Gartner estimates that total global AI spending will exceed $2.5 trillion in 2026. Corporate AI budgets grew by 36% in one year, averaging from $63,000 to $85,500 per month per organization. And 72% of companies plan to further increase their spending on language models.
A CIO interviewed by Andreessen Horowitz summed up the phenomenon in one sentence: "What I was spending in 2023, I now spend in a week."
This isn't just a big-company issue. As soon as an SME uses AI regularly — even through consumer subscriptions — it consumes tokens. And it pays for them, directly or indirectly.
Token prices are falling — faster than any technology before
What makes the token strategically fascinating isn't just its existence. It's the dynamics of its price.
According to Epoch AI data, the cost to achieve the same level of AI performance has dropped at a median rate of 10x per year for top-performing models. And since early 2024, this acceleration has intensified further — with drops reaching up to 200x per year at certain performance levels.
In concrete terms: an intelligence capability that cost $20 per million tokens in late 2022 now costs less than $0.40. That's a 50x reduction in less than three years. The most economical models, such as GPT-4o Mini or Gemini Flash, go as low as $0.15 per million input tokens.
To put the scale of this drop in perspective: it's faster than the decline in computing costs during the microprocessor revolution, and faster than the decline in bandwidth costs during the broadband era.
And all signs point to the trend continuing. Competition among providers is fierce, algorithmic efficiency gains are accumulating (roughly 3x per year according to researchers), and hardware improves with each new generation of chips.
Why this price drop changes everything — and doesn't solve everything
One might think: "If AI is getting cheaper, that's good news, and the issue resolves itself." That would be missing the point entirely.
What's happening here is a phenomenon economists know well: the Jevons paradox. When a resource becomes cheaper per unit, we don't consume less of it — we consume much more.
This is exactly what's happening with tokens. The unit price is falling, but volumes are exploding. Models are becoming more sophisticated and consuming more tokens per operation. AI agents, which chain dozens of calls to complete a complex task, multiply consumption. And use cases are diversifying as the cost makes profitable operations that weren't viable six months earlier.
The result: despite the spectacular drop in unit price, total AI spending in enterprises is growing 30 to 40% per year. In 2026, according to the FinOps report, 98% of organizations actively manage their AI spending — compared to only 31% two years earlier.
This paradox has a major strategic implication for leaders: what costs too much to automate today will become profitable in six months. Companies that understand this price curve have a considerable timing advantage over those waiting for things to be "ready."
Pooling, optimizing: the token can be managed
Faced with this reality, the most advanced companies aren't just monitoring their bills. They're starting to manage their token consumption the way they manage their other resources — with architectural choices, pooling, and optimization.
Choosing the right model for the right use is the first lever. Today, a premium model (GPT-5.2, Claude Opus) costs between $15 and $75 per million output tokens. A mid-range model (Claude Sonnet, GPT-4o) costs between $3 and $15. An economy model (GPT-4o Mini, Gemini Flash, Haiku) costs less than $1. For the same task — summarizing an email, classifying a ticket, rephrasing text — the result can be identical with a model ten times cheaper. Conversely, using a premium model for a simple operation is like taking a plane to cross the street.
Large companies are already deploying intelligent routing systems that automatically direct each request to the most appropriate model. This principle can be applied at SME scale in a simpler way: knowing which uses justify a powerful model and which are fine with a lightweight one. It's a strategic trade-off, not a technical choice.
Pooling is a second lever. Rather than each employee using their own individual subscription in isolation, a company can centralize and pool AI access through a shared interface or API. This allows for volume aggregation, better rate negotiation (providers offer volume discounts), usage control, and above all, measurement of actual consumption. Without pooling, each employee pays the full individual price, and the company has no visibility over the whole.
Prompt and workflow optimization is a third lever, less visible but significant. How a request is formulated, the volume of context sent to the model, the number of round-trips needed — all of this directly affects token consumption. A well-designed prompt can cut by half or a third the number of tokens needed for the same result. Conversely, a poorly configured system — sending too much context, making unnecessary calls, or failing to cache reusable results — can multiply the bill by 3 or 4 without the final result being any better. This is where much of the real economics of enterprise AI plays out.
In other words, the token is not inevitable. It's a resource — and like any resource, it can be managed, pooled, and optimized.
The forecasting problem: when consumption escapes the budget
But there's another aspect of the topic, less comfortable, that many companies are beginning to discover the hard way: predicting actual token consumption is extremely difficult.
With traditional software, the expense is known in advance. You pay a license or flat-rate subscription, and the cost stays stable regardless of usage. With tokens, it's the opposite: consumption depends on actual usage, and that usage is inherently unpredictable.
Let's consider three concrete situations.
Internal use first. When a team of 20 people starts using AI daily, their consumption increases mechanically as employees gain proficiency and discover new use cases. What started at a few hundred requests per week can turn into several thousand within months — without any budget having been planned for this ramp-up. And recent "reasoning" models — those that "think" at length before responding — worsen the phenomenon: they generate thousands of internal reasoning tokens for a single response, multiplying consumption invisibly.
External-facing use next. If a company integrates AI into a client-facing service — a chatbot on its website, a support assistant, a recommendation tool — consumption becomes a function of traffic. A spike in activity, a successful marketing campaign, a viral article driving traffic: all situations where token consumption can explode without the company anticipating it. Some platforms have discovered what are called "inference whales" — individual users who consume tens of thousands of dollars worth of tokens under a flat-rate subscription. The gap between what the customer pays and the actual token cost can become a chasm.
Autonomous agents last. As AI evolves toward systems capable of acting independently — the so-called "agents" — consumption becomes even harder to predict. An agent solving a complex problem can chain dozens of model calls, analyze documents, query databases, and generate lengthy intermediate reasoning. The bill for a single operation can vary a hundredfold depending on case complexity. A technology investor revealed that his AI agents cost him $300 per day — the equivalent of $100,000 per year — while running at only 10 to 20% of their capacity.
The risk isn't paying too much. The risk is not knowing how much you'll pay — and discovering the real cost after the fact.
This unpredictability raises two strategic questions.
The profitability question first. If a company builds an offering or service whose production cost depends on token volume — and that volume is hard to predict — how do you guarantee the margin? How do you set a selling price when the unit production cost fluctuates? This is a paradigm shift from traditional software, where the marginal cost of an additional use is close to zero. With tokens, every use has a cost — and that cost can vary considerably.
The fragility question next. A business model built on poorly controlled token consumption is structurally fragile. If prices drop, margins improve. But if volumes spiral, if a model is replaced by a more expensive one, if a provider changes its rates, or if internal usage explodes without anticipation — financial balances can shift quickly. This is a risk that SME leaders must factor into their thinking, just like dependency on a supplier or a distribution channel.
The good news is that this risk can be managed — provided it has been identified. Setting up alert thresholds, capped budgets per use case, real-time monitoring mechanisms: these are simple practices, more about management discipline than technical prowess. But they first require knowing the problem exists. And for many SMEs, that awareness hasn't happened yet.
"Shadow consumption": your teams are already using tokens
There's yet another blind spot that many leaders underestimate.
As we discussed in a previous article, 75% of knowledge workers already use AI at work — often without their management knowing. This is what's called "shadow AI."
This phenomenon has a direct corollary in terms of tokens: your employees are already consuming artificial intelligence, and therefore tokens, without anyone measuring, managing, or budgeting them. And these individual consumptions, taken in isolation, seem trivial. A ChatGPT subscription at $20 per month. A few Copilot queries built into the Office suite.
But multiply that by 15, 30, or 100 employees — and add the usage that doesn't go through the company's official channels — and you get a real cost item, unmanaged, unoptimized. And above all, un-pooled: everyone pays the full individual price, without benefiting from collective volumes.
The token makes this phenomenon measurable. It's both the problem and the solution: when you can count tokens, you can start managing your AI consumption the way you manage your other resources.
What this means for an SME leader
Most SME leaders haven't heard of the token yet. And yet, several strategic decisions already depend on it.
Intelligence becomes a variable cost. Unlike traditional software billed as a flat rate or per license, AI is billed per use. The more you use it, the more you pay. This changes the budgeting logic: it's no longer about buying a tool, but managing consumption — with everything that implies in terms of forecasting, steering, and risk of cost overruns.
The price curve creates a timing advantage. The price drop makes things possible that weren't yesterday. Automating the processing of 100 emails per day, having contracts analyzed in series, generating personalized reports for each client — these use cases were economically unrealistic a year ago. They no longer are. And in six months, others will become viable in turn. The leader who understands this dynamic can anticipate — and get ahead of competitors.
Optimization and pooling are strategic levers. Choosing the right model for the right use, centralizing access, structuring prompts, pooling volumes: these aren't technical topics. They're management decisions that can divide the bill by 5 or 10 — for the same result.
Consumption forecasting becomes a leadership issue. When spending depends on usage, and usage is inherently hard to predict, the risk of cost overruns is real — especially if client-facing services or autonomous agents enter the equation. The token introduces a form of fragility that leaders must learn to manage, just as they learned to manage cloud dependency or variable online advertising costs.
The token will become a management indicator. Like customer acquisition cost, revenue per employee, or gross margin, the token cost per operation will gradually establish itself as a metric to track. The most advanced companies are already implementing "AI FinOps" practices — governance of their artificial intelligence consumption.
After the kWh, after the GB: the token
The history of technology teaches us that each major wave has produced a unit of measurement that ends up structuring an entire sector's economy.
The kilowatt-hour structured the energy economy. The gigabyte structured the storage and cloud economy. Bandwidth structured the streaming and telecommunications economy.
The token is structuring the on-demand intelligence economy.
And as with each of these units before it, those who understood them first profited from them. Those who discovered them too late suffered them — often by paying more, falling behind, or finding themselves dependent on choices they hadn't anticipated.
The token is not a technical topic reserved for developers. It's the new unit of measurement for the value AI produces — and what it costs. And like any unit of measurement, it reveals as much as it demands: it makes visible what we consume, what we spend, and what we haven't yet mastered.
See what's coming, prepare, act
Today, most SMEs consume tokens without knowing it, without measuring them, and without managing them. This isn't a criticism — it's an observation. The topic is recent, the vocabulary is new, and benchmarks are still missing.
But this observation also outlines an opportunity. Understanding what a token is, knowing how it's billed, following its price dynamics, anticipating the risks of unforeseen consumption, and structuring pooling — it's about giving yourself the means to make better-informed decisions, at the right time.
This is leadership work, not a topic to delegate to IT. And it's exactly the kind of strategic thinking we support at IMPAICT: understanding what's at play, assessing your situation, and structuring an adapted response — before the topic imposes itself on you.
Sources
- Epoch AI, "LLM inference prices have fallen rapidly but unequally across tasks", 2024
- Epoch AI, "How persistent is the inference cost burden?", 2025
- Andreessen Horowitz, "How 100 Enterprise CIOs Are Building and Buying Gen AI in 2025", June 2025
- Ramp, "The cost of AI is decreasing", March 2025
- Precedence Research / The AI Enterprise, "The New Enterprise Currency: Why Your AI Strategy Lives or Dies by the Token", March 2026
- Gartner, "Worldwide AI Spending Will Total $1.5 Trillion in 2025", September 2025
- FinOps Foundation, "2026 State of FinOps Report", 2026
- Kong Inc., "Enterprise GenAI Spending Going Up in 2025", 2025
- Menlo Ventures, "2025: The State of Generative AI in the Enterprise", December 2025
- OpsLyft, "Hidden AI Costs: Why Falling Token Prices Increase Spend", 2025
Comments (0)
Be the first to comment.