How to Get OpenAI API Cost Per User (Not Just Total Bill)

Why OpenAI only shows you total costs

Here is the thing people miss: OpenAI is not being stingy with the data. It literally does not have it. When your backend calls the API, it sends one request with one API key. There is no field that says “this token spend belongs to Alice on the starter plan.” From OpenAI’s side, every request in your account looks the same. It is all just your app.

So the dashboard and the usage API report at the only two levels OpenAI actually knows about: your account, and (if you split things up) your projects. You can see spend per model, spend per day, spend per project key. You cannot see spend per your customer, because OpenAI has never heard of your customers.

That is not a bug you can file. It is structural. The moment you multiplex thousands of end users through a single credential, the per-user dimension has to live on your side of the wire. If you want it, you build it.

How to attribute costs to users manually from usage logs

The good news is that OpenAI hands you exactly what you need on every response. A normal (non-streamed) chat completion comes back with a usage object containing the token counts for that specific call:

const response = await openai.chat.completions.create({
  model: 'gpt-4o-mini',
  messages,
})

console.log(response.usage)
// {
//   prompt_tokens: 812,
//   completion_tokens: 143,
//   total_tokens: 955,
// }

Those two numbers, prompt_tokens and completion_tokens, are the raw material for per-user cost. OpenAI does not attribute them to a user for you, but you can, because you know who made the request. Capture the counts right after the call returns and write them to a log with the user id attached:

const response = await openai.chat.completions.create({ model, messages })

await db.aiUsage.insert({
  userId: req.user.id,          // you have this. OpenAI does not.
  model,
  promptTokens: response.usage.prompt_tokens,
  completionTokens: response.usage.completion_tokens,
  createdAt: new Date(),
})

Do that on every call and you now have something OpenAI cannot give you: a per-user record of token consumption. Turning it into dollars is the next step.

Grab the current rate from the OpenAI pricing page. Today, gpt-4o is $2.50 per million input tokens and $10 per million output tokens. gpt-4o-mini is far cheaper at $0.15 and $0.60. The cost of one call is:

const PRICES = {
  'gpt-4o':      { in: 2.5,  out: 10 },   // USD per 1M tokens
  'gpt-4o-mini': { in: 0.15, out: 0.6 },
}

function costUsd(model, promptTokens, completionTokens) {
  const p = PRICES[model]
  return (promptTokens * p.in + completionTokens * p.out) / 1_000_000
}

The limits of OpenAI’s built-in usage dashboard

Once you have manual attribution working, the OpenAI dashboard starts to feel like a toy. A few reasons it will never be enough for a real SaaS:

It is delayed. Usage data lags, sometimes by hours, so it is useless for catching a runaway user in real time. It is aggregate only. There is no user dimension, no feature dimension, nothing you can slice by. It has no concept of margin, because OpenAI does not know what you charge your customers. And most insidiously, it quietly resets your mental model back to “total bill” thinking. You look at one big scary number, feel a vague dread, and change nothing, because a single total is not something you can act on.

A per-user view flips that. Instead of “we spent $6,180,” you get “these 11 users on the $9 plan each cost us over $30, and this one enterprise account costs us $4 against the $499 they pay.” One of those is a number you can do something with.

How to tag LLM calls with user IDs

The core move is almost embarrassingly simple. You already have the user id sitting in your request context on every call. The whole game is making sure it gets logged next to the token counts, every time, without anyone forgetting. Here is the pattern in TypeScript:

export async function callLLM(userId: string, feature: string, messages) {
  const response = await openai.chat.completions.create({
    model: 'gpt-4o-mini',
    messages,
  })

  // fire-and-forget: do not block the response on logging
  void logUsage({
    userId,
    feature,
    model: 'gpt-4o-mini',
    promptTokens: response.usage.prompt_tokens,
    completionTokens: response.usage.completion_tokens,
  })

  return response
}

And the same idea in Python, reading the usage fields off the response object:

def call_llm(user_id: str, feature: str, messages):
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
    )

    log_usage(
        user_id=user_id,               # from your auth context
        feature=feature,
        model="gpt-4o-mini",
        prompt_tokens=response.usage.prompt_tokens,
        completion_tokens=response.usage.completion_tokens,
    )

    return response

That is the entire trick. The hard part is not the code, it is the discipline: doing it on every call site, keeping the price table current, and building the aggregation and dashboard on top. That is where the manual approach starts eating your Fridays.

How Weckr tracks this automatically

Weckr is the “log the user id on every call” pattern, done for you. You wrap your existing OpenAI client once, pass a userId and a plan, and it handles the token capture, the cost math, and the per-user dashboard:

import { Weckr } from '@weckr/sdk'

const wk = new Weckr({
  apiKey: process.env.WECKR_API_KEY,
  plans: { free: 0, starter: 9, pro: 29, business: 499 },
})

const response = await wk.chat(openai, {
  model: 'gpt-4o-mini',
  messages: [{ role: 'user', content: prompt }],
  userId: user.id,
  feature: 'ai-summary',
  plan: user.plan,
})

The return value is identical to calling OpenAI directly, so you can drop this in without changing anything downstream. Cost is computed server-side from the token counts and current pricing, which means it never goes stale when OpenAI changes rates. And because Weckr logs asynchronously (fire-and-forget after the call returns), it adds zero latency to your LLM request. Two lines to set up, and you get the per-user breakdown OpenAI cannot give you.

Python is the same shape, one install and one wrapper:

pip install weckr-sdk

From there you get a live per-user view: who spent the most tokens, what each user cost you, and (because you gave it the plan prices) the margin on each customer. You can poke at it on seeded data without signing up at useweckr.com/demo.

Supported providers: OpenAI, Anthropic, Gemini

Most apps do not stay single-provider for long. You start on OpenAI, add Anthropic for one feature because it handles long context better, then try Gemini somewhere for cost reasons. Now you have three dashboards, three pricing tables, and three different ways of counting tokens to reconcile.

Weckr normalizes all three. OpenAI, Anthropic and Gemini all report into the same dashboard under the same userIddimension, with token counts normalized so a user’s total cost is one honest number regardless of which model served each request. If Alice hit gpt-4o twice and Claude once today, you see her combined cost, not three fragments you have to add up by hand.

FAQ

Does OpenAI show cost per user?

No. The OpenAI dashboard and usage API report spend at the account and project level, not per your end user. OpenAI has no idea who your users are because you send requests with one API key on their behalf. To get cost per user you have to attribute it yourself from token counts.

How do I track API costs per customer in my app?

Capture response.usage after every OpenAI call, store prompt_tokens and completion_tokens alongside the customer id from your request context, then multiply by the model price. The easiest path is to wrap your OpenAI client with the Weckr SDK, which does the logging and cost math for you. Install it with npm install @weckr/sdk.

How do I add user attribution to OpenAI API calls?

You already have the user id in your backend request context. The trick is logging it next to the token counts on every single call, not just at billing time. With Weckr you pass userId to wk.chat() and the attribution happens automatically, server side.

What is the cheapest way to monitor OpenAI usage per user?

Weckr has a free tier that covers 50,000 requests per month with no per-request fee, which is enough for most early-stage apps. The Pro plan is $49 per month. Building your own logging pipeline is technically free but costs you engineering time and goes stale every time OpenAI changes prices.

How do I see which users are using the most tokens?

Group your logged token counts by user id and sort descending. If you are logging manually, that is a SQL query against your call log. With Weckr the per-user leaderboard is already rendered in the dashboard, ranked by cost and token volume across OpenAI, Anthropic and Gemini.

Turn the total bill into a per-user breakdown

The OpenAI total will always just be a total. It cannot see your users, so it cannot tell you which one to upsell, which one to cap, or which one is quietly eating a month of margin on a $9 plan. That answer only exists on your side, built from the token counts OpenAI already gives you on every response. And once you can see who costs the most, the next move is reducing those OpenAI costs without degrading output quality.

You can wire up the logging, pricing table, and aggregation yourself in a week or two, or you can wrap your client with Weckr and have the per-user view tonight. See it running on real-looking data, no signup required, at useweckr.com/demo.