What Uber and Microsoft Learned About AI Costs the Hard Way

Two giants, same mistake

It is tempting to read the Uber and Microsoft headlines as separate stories. One is a ride-hailing company that overspent. The other is a trillion-dollar software company reshuffling its internal tooling. Different companies, different scale, different details.

But look past the details and they are the same failure. Both handed a genuinely great AI tool to a large group of people. Both watched adoption climb faster than anyone modeled. And in both cases, the cost was usage-based and invisible per person until the invoice arrived. That is not two problems. It is one problem, told twice.

The reassuring part, if you run a smaller company, is that the fix does not require Uber’s or Microsoft’s budget. It requires seeing the cost per user in time to do something about it. Let’s walk through both stories, then the pattern underneath them.

Uber: an entire year’s AI budget gone in four months

Uber rolled out Anthropic’s Claude Code, a command-line coding agent, across its engineering organization of roughly 5,000 engineers. Engineers loved it. According to Forbes, adoption jumped from about 32% to 84% of the engineering org in a short window. That is exactly what you want from a tool: people reach for it because it makes them faster.

The trouble is what that adoption did to the bill. Per-engineer cost averaged about $150 to $250 a month, and power users ran anywhere from $500 to $2,000 a month. Multiply that across thousands of engineers, with usage still climbing, and the math moves fast. Uber burned through its entire planned 2026 AI budget in roughly four months, by about April 2026.

The response was a cap. Uber limited employee AI spend to around $1,500 a month per tool. Its COO/CTO publicly questioned whether the spend was even worth it, a striking thing to say out loud about a tool your engineers had clearly voted for with their keyboards. As Fortunereported, the conversation inside Uber shifted from “how do we roll this out” to “how do we contain this.”

Notice what did not happen. Uber did not fail to see the value. It failed to see the cost building up per engineer, in time to steer it, before four months of budget was already gone.

Microsoft: when your best tool is too good to afford

Microsoft’s version played out inside its Experiences & Devices group, the division behind Windows, Microsoft 365, Outlook, Teams and Surface. Same setup: Claude Code got popular, and the constant use broke the unit economics at current token prices.

So Microsoft cancelled most direct Claude Code licenses in that division and told thousands of engineers, PMs and designers to move to GitHub Copilot CLI by June 30, 2026. Per Forbes, the official internal reason cited was “toolchain unification.” That is a real consideration for a company Microsoft’s size. But as The Next Web and others reported, cost reduction was widely understood to be a real driver behind the switch.

There is a certain irony worth sitting with. The problem was not that the tool underperformed. The problem was that it was so useful that people wouldn’t stop using it, and at prevailing token prices that level of use was more than the org wanted to pay. Your best tool becoming too expensive to keep is a very specific kind of failure, and it is the same one Uber hit.

The pattern: usage-based cost + no per-user visibility + no cap

Strip both stories down and the shape is identical. Three ingredients, every time:

First, the cost is usage-based. You do not pay a flat license fee. You pay for tokens, and tokens track how much people actually use the thing. Second, adoption is high, and it climbs, because the tool is good. Third, and this is the fatal one, nobody can see the cost broken down per person in real time. There is one big number, and it only resolves into detail at the end of the billing period.

Put those three together and you get a budget blowout that surprises you at invoice time. Not because anyone did anything wrong. The tool worked. People used it. The bill just arrived faster and larger than the model predicted, and by then the money was spent.

The important thing is what the fix is not. The fix is not “stop using AI.” Both companies still want the productivity. The fix is to make the invisible visible: attribute cost per user (or per seat, or per feature), watch it in real time, and cap the outliers before they run up the bill. Uber got to the cap eventually. The expensive part was getting there four months late.

Why this is worse for a SaaS that resells AI

Here is where it stops being a story about tech giants and starts being a story about you.

If you run a SaaS that wraps OpenAI, Anthropic or Gemini and sells the result to customers on a flat monthly plan, you have Uber’s problem with one crucial difference: your “engineers” are your users. And unlike Uber, you cannot email your users and ask them to spend less. They are paying you a fixed price and they will use the feature as much as they like.

That flips the economics against you in a way traditional SaaS never had to worry about. Revenue is flat, the subscription price, while cost is variable, whatever the model charges based on tokens. A handful of heavy users can do to your plan exactly what Claude Code did to Uber’s budget. One customer on your $29 plan hammering an AI feature can quietly cost you $80 a month, and the aggregate provider bill will never point them out. You will just watch your margin erode and not know why.

For a company Uber’s size, four months of overspend is a bad quarter. For a bootstrapped SaaS, a few undetected power users can be the difference between profitable and underwater. The stakes are proportionally higher, and the visibility is usually far worse, because most small teams are just staring at the same single provider total the giants were.

What to do instead: attribute per user, watch in real time, cap the outliers

The lesson from both companies compresses into three moves, and none of them is “use less AI.”

Attribute cost to the user, feature or seat that generated it. Watch it live instead of discovering it at month end. And put a cap on the heaviest users so no single outlier can blow the budget. This is precisely what Weckr is built to do. You wrap your existing OpenAI, Anthropic or Gemini client with the SDK, and it logs cost and margin per user in real time:

import { Weckr } from '@weckr/sdk'

const wk = new Weckr({
  apiKey: process.env.WECKR_API_KEY,
  plans: { free: 0, starter: 9, pro: 29, business: 99 },
})

const response = await wk.chat(openai, {
  model: 'gpt-4o-mini',
  messages: [{ role: 'user', content: prompt }],
  userId: user.id,
  feature: 'ai-summary',
  plan: user.plan,
})

From there, the two failures above become things you catch instead of things that catch you. Weckr fires a Slack or email alert the moment a single user burns unusual token volume in a short window, which is its velocity and loop detection, so a runaway user surfaces in hours rather than at invoice time. And it enforces per-plan spending caps, blocking or downgrading a user once they cross the limit you set, so an outlier gets contained automatically instead of manually four months later.

Weckr’s free tier covers up to 50,000 requests a month, and the Pro plan is $49 a month. The point is not the pricing though. It is that Uber and Microsoft had every resource in the world and still got surprised, purely because they could not see the cost per user in time. That part is fixable, and it does not take a hyperscaler budget to fix it.

FAQ

How did Uber burn through its AI budget so fast?

Uber gave its roughly 5,000 engineers access to Anthropic's Claude Code, a command-line coding agent, and adoption spread fast. Reports cite Claude Code going from about 32% to 84% of the engineering org in a short window. Per-engineer cost averaged $150 to $250 a month, with power users running $500 to $2,000 a month. That combination burned through the entire planned 2026 AI budget in about four months.

Why did Microsoft cancel Claude Code for its engineers?

Inside its Experiences & Devices group (the division behind Windows, Microsoft 365, Outlook, Teams and Surface), Claude Code became extremely popular and the constant use broke the unit economics at current token prices. Microsoft cancelled most direct Claude Code licenses and told thousands of engineers, PMs and designers to move to GitHub Copilot CLI by June 30, 2026. The stated internal reason was toolchain unification, but cost reduction was widely reported as a real driver.

What can a small SaaS learn from Uber and Microsoft's AI cost problems?

Both are the same failure: usage-based cost, high adoption, and no per-user visibility or cap add up to a budget blowout that only shows up at invoice time. The fix is not to stop using AI. It is to attribute cost per user or per seat, watch it in real time, and cap the outliers before they run up the bill. A small SaaS that resells AI on a flat plan faces the exact same math, just with customers instead of engineers.

How do I stop AI costs from scaling out of control?

Attribute cost to the user, feature or seat that generated it, watch it in real time rather than at month end, and set caps on the heaviest users so a few outliers cannot blow the budget. Uber responded to its overrun by capping employee AI spend at around $1,500 a month per tool. The same three moves, attribution plus real-time monitoring plus caps, work at any scale.

What is a token budget and should my company set one?

A token budget is a cap on how much a user, team or tool can spend on AI inference over a period, expressed in tokens or dollars. It matters because AI cost is usage-based and a handful of heavy users can consume most of the spend. If you resell AI or hand it to a large team, a per-user or per-tool budget with alerts is the cheapest insurance against a surprise invoice.

See the cost per user before it becomes a headline

Uber and Microsoft did not lack talent or tooling. They lacked a live, per-user view of where the AI spend was going, and by the time the total told the story, the budget was already gone. Your SaaS runs on the same math, just with customers in place of engineers, and usually with far less room to absorb a surprise.

Weckr gives you that per-user view: cost and margin per customer in real time, an alert when someone spikes, and a cap so the outliers can’t run away with your margin. You can wire it into your existing client in a couple of lines, or just see it running on seeded data first, no signup needed, at useweckr.com/demo.