Microsoft canceled Claude Code licenses this week. Uber burned through its entire AI budget in four months. The era of cheap cloud AI is ending, and every enterprise that built workflows on per-token pricing is about to learn why cost predictability matters.
On 21 May 2026, Microsoft canceled its internal Claude Code licenses. Token-based billing made the cost untenable — even for a company that put $13 billion into OpenAI and runs the Azure infrastructure that powers most of Anthropic’s compute.
The same week, Uber’s CTO sent an internal memo warning the company had burned through its entire 2026 AI budget in just four months. American AI software prices have jumped 20% to 37% across the board. And GitHub, which Microsoft owns, is dropping flat-rate plans for usage-based billing across its entire product line.
None of this is a coincidence. They are all the same event, viewed from different angles.
Microsoft looked at the bill from a competitor’s coding tool and decided it was not worth paying. That is not a productivity failure on Anthropic’s end. Claude Code works. The engineers liked it. The problem is simpler and more structural: token-based pricing forces every enterprise customer to confront the actual cost of running these models at scale, and the number turns out to be far higher than the flat-rate experiments suggested.
During the early adoption phase, most enterprises assumed AI costs would keep falling. The narrative was clear — models get cheaper, inference gets faster, the marginal cost of intelligence approaches zero. Workflows were designed around that assumption. Budgets were approved with that curve in mind.
The curve has not been falling. Anthropic, OpenAI, and Google have all raised effective prices in the last six months. The pattern is consistent: every major provider is moving toward per-token, per-seat, or per-usage billing that scales directly with how much you actually use the system.
That is fine for a demo. It is catastrophic for an organisation running AI agents that process thousands of queries a day.
Enterprises now face two options. Both of them lead to the same destination.
Option one: scale back AI usage to fit budgets. Cut the number of queries, reduce agent autonomy, throttle background processing. This immediately slows the revenue ramp that the labs need to justify their valuations ahead of upcoming IPOs. The models become less useful, adoption stalls, and the whole cycle resets.
Option two: the labs cut prices and absorb the losses. Unit economics deteriorate further at exactly the wrong moment — when the hardware arms race is intensifying, when the memory wall is making inference hardware more expensive, and when companies like Anthropic are scrambling to lock in 220,000 GPUs through SpaceX.
Both paths land in the same place. The numbers stop working, and somebody has to take the writedown.
This is what happens when you build a business on a resource that someone else controls and prices. You get the capability, but you never get the predictability.
For UK businesses operating under SRA, FCA, GDPR, or any regulatory framework, the implications are immediate. Any organisation that built AI workflows on cloud APIs is now exposed to two risks simultaneously: cost volatility and compliance uncertainty.
Cloud AI gives you access to powerful models, but you have no idea what your monthly bill looks like until it arrives. Usage scales with demand, and demand scales with adoption — which means your costs grow precisely when you think adoption is working. It is the worst kind of cost structure for a business.
On-premises AI deployed on open-weight models like Qwen3.6-27B eliminates that variable entirely. You buy the hardware once. You run the model. Your inference cost is electricity. There are no tokens, no per-seat fees, no usage multipliers, no surprise rate hikes from a provider you did not choose and cannot negotiate with.
This is not just a security argument. It is an economic one.
We have written about why high street law firms can’t afford cloud AI from a compliance perspective, and about capability sovereignty as a parallel to data sovereignty. But the cost argument is now the most urgent one. The subsidy era is over, and the companies that own their own infrastructure are the only ones that get to plan.
This is not a prediction about what will happen. It is happening right now. Microsoft canceled Claude Code this week. Uber’s budget evaporated in four months. GitHub is shifting to usage-based billing. The pricing increases are already baked into every provider’s roadmap.
The organisations that move to on-premises AI now lock in their costs for the life of the hardware. The ones that wait keep paying tokens at prices that have nowhere to go but up.
UK enterprises are already making this calculation — quietly, without press releases, through risk register decisions that account for both compliance and cost. The AI subsidy era was never going to last. The question is whether you noticed before your bill arrived.
JD Fortress AI builds secure, on-premises RAG and agent solutions for UK businesses in regulated sectors. If you’re exploring always-on, private AI teammates with predictable costs — no tokens, no usage surprises — get in touch for a confidential discussion.
If you're thinking about secure AI for your business, we'd love to have a conversation.
Get in Touch →