Documentation
Weckr is AI cost & margin intelligence for SaaS founders. Wrap any LLM client, get per-user and per-feature cost data, set spending caps that the SDK enforces before the LLM call, and receive recommendations on pricing & cheaper models.
This page covers the Python SDK weckr-sdk@0.1.3 on PyPI. TypeScript users see /docs. The two SDKs are wire-compatible — same backend, same dashboard, same caps.
Quick start
Drop the SDK in front of your existing LLM client. The call returns unchanged; Weckr logs cost + margin asynchronously and enforces caps before each call.
pip install weckr-sdk openaifrom openai import OpenAI
from weckr import Weckr
openai_client = OpenAI() # reads OPENAI_API_KEY
wk = Weckr(
api_key="wk_...", # from app.useweckr.com signup
plans={"free": 0, "pro": 29}, # your plan prices in USD
)
result = wk.chat(
openai_client,
{
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": "Summarize this doc."}],
"user_id": user.id, # your app's user id
"feature": "doc-summary", # anything you'd group cost by
"plan": user.plan, # matches one of plans above
},
)
print(result.choices[0].message.content)wk.chat(client, params_dict). Per-call metadata (user_id, feature, plan) live INSIDE the params dict alongside model / messages— same shape as the TypeScript SDK.Install & configure
1. Sign up & create a project
Sign up at app.useweckr.com/auth/signup. Right after, you'll create a project and we'll show you its wk_ key once. Copy it — the dashboard masks it from then on.
2. Install the SDK
Zero runtime dependencies. Bring your own LLM SDK:
pip install weckr-sdk openai # OpenAI
pip install weckr-sdk anthropic # Anthropic
pip install weckr-sdk google-genai # Gemini (new SDK)
# or all at once:
pip install "weckr-sdk[all]"Python 3.9+. Uses urllib internally so the SDK itself has no runtime deps.
3. Initialise once at boot
# app/weckr.py
import os
from weckr import Weckr
wk = Weckr(
api_key=os.environ["WECKR_API_KEY"],
plans={
"free": 0,
"starter": 9,
"pro": 29,
"business": 99,
},
# optional:
# on_error=lambda e: print("Weckr async error:", e),
# on_downgrade=lambda info: analytics.track("ai_downgrade", info),
)plansmap is how Weckr knows what each user pays you. It's what powers the margin column on the dashboard and the recommendation engine.Logging LLM calls
Wrap your existing client — OpenAI, Anthropic, or Gemini — via wk.chat(client, params). The SDK detects the provider, makes the call, returns the original result, and after the call resolves, fires a fire-and-forget log on a daemon thread.
OpenAI
from openai import OpenAI
openai_client = OpenAI()
result = wk.chat(openai_client, {
"model": "gpt-4o",
"messages": [{"role": "user", "content": prompt}],
"user_id": user.id,
"feature": "ai-summary",
"plan": user.plan,
})Anthropic
from anthropic import Anthropic
anthropic = Anthropic()
result = wk.chat(anthropic, {
"model": "claude-sonnet-4",
"max_tokens": 1024,
"messages": [{"role": "user", "content": prompt}],
"user_id": user.id,
"feature": "ai-essay",
"plan": user.plan,
})Gemini
# New SDK (recommended)
from google import genai
gemini = genai.Client()
result = wk.chat(gemini, {
"model": "gemini-2.5-flash",
"messages": [
{"role": "system", "content": "You are a translator."}, # mapped to system_instruction
{"role": "user", "content": prompt},
],
"user_id": user.id,
"feature": "ai-translate",
"plan": user.plan,
})The legacy google-generativeai SDK also works — the Python SDK detects either client shape. Message roles are preserved as multi-turn contents; system messages are mapped to Gemini's system_instruction.
Fields you should pass
user_id— the end-user of your app, stable across requests. Use your auth user id. Required for cap checks and per-user margin. Omit (or passNone) for anonymous calls — the row still lands; cap checks are simply skipped.feature— a label like'ai-summary'. Groups requests on the Features page and powers model recommendations.plan— matches one of the keys in yourplansconfig. Passing a value not inplansraisesWeckrConfigErrorat call time.
Anonymous calls
For calls that don't belong to a logged-in user (marketing tools, anonymous demos, internal scripts), omit user_id, feature, and plan. The row lands with NULLin those columns and shows up grouped under “None” on the Users page.
wk.chat(openai_client, {
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": prompt}],
})Streaming
If the LLM client returns a streaming response, the SDK detects it after the call and emits an on_errorwarning — token usage isn't available on streams by default and the row will log 0 tokens. Don't pass streaming params to wk.chat() today. Wrapped streaming with full usage capture lands in a future release.
Spending caps
Set a max monthly AI spend per user per plan. The SDK checks the cap before every LLM call (with a 60-second in-memory cache per (user_id, plan_name, model), so at most one extra request per minute per cache key).
1. Configure in the dashboard
On /dashboard/settings set a monthly cap (USD) for each plan and pick the action:
- Block call — the SDK raises
WeckrCapErrorbefore reaching the LLM. You catch it and show your user an upgrade prompt. - Downgrade model— the SDK silently swaps to a cheaper model in the same provider (e.g.
gpt-4o→gpt-4o-mini). The call still completes and logs against the new model.
2. Handle the block case
from weckr import WeckrCapError, is_weckr_cap_error
try:
result = wk.chat(openai_client, params)
return result
except WeckrCapError as err:
return JSONResponse(
status_code=402,
content={
"error": "AI spend cap reached for this user",
"cap": err.cap,
"spent": err.current_spend,
"plan": err.plan_name,
},
)Downgrade is transparent
When the SDK downgrades, it emits a one-time WeckrDowngradeWarning per (user_id, model) pair via Python's warnings module. No raise. The log row records the actually-used model.
Override with an on_downgrade callback for analytics:
wk = Weckr(
api_key=os.environ["WECKR_API_KEY"],
plans={...},
on_downgrade=lambda info: analytics.track("ai_downgrade", info),
# info = {"userId": str, "from": str, "to": str}
)WeckrDowngradeWarning is its own subclass of UserWarning, so apps that run warnings.filterwarnings("error") don't accidentally turn a downgrade into an exception. To silence: warnings.filterwarnings("ignore", category=WeckrDowngradeWarning).
Fail-open / fail-closed
- 5xx, 429, network error, timeout— fail OPEN (LLM call proceeds). The error goes to
on_errorif provided. - 401, 403 — fail CLOSED with a
WeckrConfigError. A typo'd or revoked api key raises synchronously so you catch the bug at boot, not silently in production. See Error handling.
disable_cap_check=True in the Weckr(...) constructor.Real-time alerts (agent loop detection)
Every successful log POST triggers a fire-and-forget velocity check on the same (project_id, user_id) over the last few minutes. If cumulative tokens cross your threshold, Weckr writes an alert_log row and dispatches whatever channels you have configured. Designed for runaway CrewAI / LangChain reasoning loops, but works for any user-attributed call.
No code change required
The detection runs server-side after each ingest. You don't add any code in your app — if you're already calling wk.chat() with a user_id, you already have loop detection.
Defaults
- Window: 5 minutes — how far back we sum tokens.
- Threshold: 50,000 tokens — if a single user crosses this in the window, the alert fires.
- Cooldown: 30 minutes — we suppress duplicate alerts for the same (project, user) so an actively-looping agent doesn't spam Slack.
Tune in the dashboard
All three knobs live in /dashboard/settings under Agent loop detection. Lower the threshold (e.g. 10k tokens / 2 min) for agent products where a single runaway crew is expensive; raise it for chat products with naturally heavy users.
Notifications: Slack & email
When loop detection trips (or any future alert type), Weckr can ping Slack and email. Both are configured per-project on /dashboard/settings and stored in the project's alert_config jsonb.
Slack
Paste an incoming webhook URL. We POST a red-attachment payload with the user_id, tokens-in-window, cost so far, request count, and a link back to the dashboard. No Slack app install required.
Toggle email alerts on, enter the destination address. Emails ship via Resend from the Weckr-verified sender domain — no SMTP setup on your end.
Test alert button
Settings has a Test alert button next to the channel inputs. It POSTs /api/v1/projects/[id]/test-alert which dispatches a fake velocity alert through every channel you've configured. Lets you confirm Slack + email work before a real anomaly trips them.
Query from Claude (MCP server)
@weckr/mcp is a Model Context Protocol server. Plug it into Claude Desktop, Cursor, or any MCP-compatible client and ask your cost data questions in plain language. It is a Node binary you launch via npx— nothing else to install even on Python projects.
Install
// Claude Desktop config (claude_desktop_config.json)
{
"mcpServers": {
"weckr": {
"command": "npx",
"args": ["-y", "@weckr/mcp"],
"env": {
"WECKR_API_KEY": "wk_..."
}
}
}
}Cursor uses the same JSON in .cursor/mcp.json. Restart your client and the Weckr tools appear.
Tools exposed
get_overview— total cost, revenue, margin, request count, unprofitable user count for the month.get_users— per-user margin breakdown, filterable.get_feature_breakdown— per-feature cost share.get_model_recommendations— same-provider cheaper-model swaps with $ saving estimates.get_pricing_recommendations— per-plan margin health + recommended price.get_spending_cap_url— returns the dashboard URL where caps are edited (does NOT mutate).
Error handling
The SDK raises two kinds of errors. Tell them apart with the type-guard helpers.
from weckr import (
WeckrCapError,
WeckrConfigError,
is_weckr_cap_error,
is_weckr_config_error,
)
try:
result = wk.chat(openai_client, params)
except WeckrCapError as err:
# User hit their monthly cap. Show an upgrade prompt.
# err.user_id / err.plan_name / err.cap / err.current_spend are populated.
return show_upgrade_prompt(err)
except WeckrConfigError as err:
# CRITICAL — Weckr is misconfigured. Send to your backend alerts.
# err.code is one of: 'invalid_api_key' | 'forbidden' | 'unknown_plan'
return alert_backend(err.code, str(err))
# any other exception is from the LLM provider itselfWeckrCapError— cap reached. User-visible: show your upgrade UI.WeckrConfigError— developer-only. Three codes:invalid_api_key: 401 from/check. Yourwk_key is typo'd, revoked, or wasn't copied correctly.forbidden: 403 from/check. Usually means the project was deleted.unknown_plan: you passed aplantowk.chat()that isn't in theplansdict. Fail-fast so a typo doesn't silently log $0-revenue rows.
on_error receives async errors (log POST failures, cap-check 5xx). It does NOT receive WeckrCapError or WeckrConfigError — those are synchronous raises. The callback runs on a background thread; make it thread-safe.Short-lived processes (Lambda, cron, CLI)
wk.chat()returns as soon as the LLM call resolves; the log POST runs on a daemon thread after that. In a long-running web server this is perfect — it adds zero hot-path latency. In a short-lived process (Lambda handler, cron job, CLI), the host can exit before the POST hits the network and you lose the last log row.
Call wk.flush() before exit. It joins every in-flight POST thread (default 5s timeout) and returns.
# Lambda handler
def handler(event, context):
result = wk.chat(openai_client, {
"model": "gpt-4o-mini",
"messages": [...],
"user_id": event["user_id"],
"plan": "pro",
})
wk.flush() # wait for the log POST before Lambda freezes the runtime
return resultLong-running servers (FastAPI, Flask, Django, etc.) don't need this — the daemon thread completes long before the process ever exits.
The dashboard
All pages live under app.useweckr.com/dashboard. Use the top-right project selector to switch between projects (or create a new one). The dashboard is the same whether you log from TypeScript or Python — the backend doesn't care which SDK wrote the row.
- Overview— 4 stat cards (cost, revenue, margin, unprofitable users), the cost-vs-revenue line chart for the last 30 days, an alert banner when any user is losing you money, plus pricing intel cards.
- Users — one row per
user_id, sorted by margin (worst first). Anonymous rows (no user_id) appear grouped under “None”. - Features — one row per
feature, sorted by total spend. - Recommendations— model swaps with projected savings.
- Settings— spending caps per plan.
Margin is derived in the read RPCs as SUM(plan_revenue_usd) - SUM(cost_usd)at full precision — users whose calls cost sub-cent on $0-revenue plans are still detected as unprofitable.
SDK reference
Weckr(...)
class Weckr:
def __init__(
self,
api_key: str, # required, starts with 'wk_'
plans: Optional[Dict[str, float]] = None, # plan -> monthly price USD
endpoint: str = "https://app.useweckr.com/api/v1/log",
check_endpoint: str = "https://app.useweckr.com/api/v1/check",
disable_cap_check: bool = False,
on_error: Optional[Callable[[BaseException], None]] = None,
on_downgrade: Optional[Callable[[Dict[str, str]], None]] = None,
) -> None: ...wk.chat(client, params)
# params dict keys:
{
"model": str, # e.g. "gpt-4o" or "gpt-4o-2024-08-06"
"messages": list[dict],
"user_id": Optional[str], # your app's user id (omit for anonymous)
"feature": Optional[str],
"plan": Optional[str], # MUST match a key in plans dict (or omit)
# ...any other provider-specific kwargs are forwarded to the underlying client
}
result = wk.chat(openai_client, params) # returns the LLM result unchangedwk.flush(timeout_seconds=5.0)
wk.flush() # wait up to 5s for in-flight log POSTs
wk.flush(timeout_seconds=10.0)Long-running servers don't need this; short-lived processes do. See Short-lived processes.
WeckrCapError
class WeckrCapError(Exception):
name: str # 'WeckrCapError'
user_id: Optional[str]
plan_name: Optional[str]
current_spend: Optional[float]
cap: Optional[float]
from weckr import is_weckr_cap_error
if is_weckr_cap_error(err): ... # show upgrade promptWeckrConfigError
class WeckrConfigError(Exception):
name: str # 'WeckrConfigError'
code: str # 'invalid_api_key' | 'forbidden' | 'unknown_plan'
from weckr import is_weckr_config_error
if is_weckr_config_error(err): ... # alert backend; misconfigPure-function exports
from weckr import PRICING, CHEAPER_ALTERNATIVE, calculate_cost, resolve_pricing
PRICING["gpt-4o"] # {'input': 2.5, 'output': 10.0}
resolve_pricing("gpt-4o-2024-08-06") # resolves to gpt-4o family pricing
calculate_cost("gpt-4o-mini", 12, 2) # 3e-6Dated variants (gpt-4o-2024-08-06, claude-3-5-sonnet-latest, etc.) resolve via longest-prefix match against PRICING. Unknown models return zero pricing rather than raising.
What gets logged
{
"userId": "u_42" or None,
"feature": "ai-summary" or None,
"model": "gpt-4o-mini", # the actually-used model (downgraded if applicable)
"provider": "openai", # "openai" | "anthropic" | "gemini"
"inputTokens": 12,
"outputTokens": 2,
"costUsd": 0.000003, # server-recalculated; client value ignored
"latencyMs": 1218,
"planName": "pro" or None,
"planRevenueUsd": 29.0 or None,
"timestamp": "2026-06-15T07:52:18.086515+00:00",
}Logging is fire-and-forget — failures never raise (they go to on_error if provided). Cost is recomputed server-side from (model, tokens). Margin is derived in the read RPCs at full precision.
HTTP API reference
The SDK is a thin wrapper around a small public API. You can integrate directly without the SDK if you prefer. Base URL: https://app.useweckr.com.
POST /api/v1/log
Records a single LLM call. Auth via x-api-key header. userId, feature, planName, planRevenueUsd may all be null or omitted (the row lands as NULL). model, provider, tokens, and latencyMs are required.
import urllib.request, json
req = urllib.request.Request(
"https://app.useweckr.com/api/v1/log",
data=json.dumps({
"userId": "user_123",
"feature": "summary",
"model": "gpt-4o-mini",
"provider": "openai",
"inputTokens": 120,
"outputTokens": 80,
"costUsd": 0.0001,
"latencyMs": 800,
"planName": "pro",
"planRevenueUsd": 29.0,
}).encode(),
method="POST",
headers={"Content-Type": "application/json", "x-api-key": "wk_..."},
)
with urllib.request.urlopen(req) as r:
print(json.loads(r.read())) # {"ok": true}costUsd is recomputed from (model, tokens) — the value you send is discarded. If the model has known pricing AND planName is set, posting inputTokens=outputTokens=0 is rejected (400) — a cap-bypass guard.GET /api/v1/check
Checks whether a given userId on a given planName is over their cap. Auth via x-api-key.
url = (
"https://app.useweckr.com/api/v1/check"
"?userId=user_123&planName=pro&model=gpt-4o"
)
req = urllib.request.Request(url, headers={"x-api-key": "wk_..."})
with urllib.request.urlopen(req) as r:
decision = json.loads(r.read())
# -> {"allowed": True, "currentSpend": 4.20, "cap": 20, "remainingBudget": 15.80}
# OR when capped:
# -> {"allowed": False, "action": "downgrade",
# "alternativeModel": "gpt-4o-mini",
# "currentSpend": 21.00, "cap": 20}Full dashboard endpoint reference is on the TypeScript docs page — they're language-agnostic.
Supported models
Per-million token prices used for cost calculation and recommendations. Dated variants resolve via longest-prefix match: gpt-4o-2024-08-06 resolves to gpt-4o; claude-3-5-sonnet-latest resolves to claude-3-5-sonnet.
| Model | Provider | Input / 1M | Output / 1M | Cheaper alternative |
|---|---|---|---|---|
| gpt-4o | openai | $2.50 | $10.00 | gpt-4o-mini |
| gpt-4o-mini | openai | $0.15 | $0.60 | — |
| gpt-4-turbo | openai | $10.00 | $30.00 | gpt-4o-mini |
| gpt-4 | openai | $30.00 | $60.00 | gpt-4o-mini |
| gpt-3.5-turbo | openai | $0.50 | $1.50 | — |
| o1-preview | openai | $15.00 | $60.00 | — |
| o1-mini | openai | $3.00 | $12.00 | — |
| claude-opus-4 | anthropic | $15.00 | $75.00 | claude-sonnet-4 |
| claude-sonnet-4 | anthropic | $3.00 | $15.00 | claude-haiku-4-5 |
| claude-haiku-4-5 | anthropic | $0.80 | $4.00 | — |
| claude-3-5-sonnet | anthropic | $3.00 | $15.00 | — |
| claude-3-5-haiku | anthropic | $0.80 | $4.00 | — |
| claude-3-opus | anthropic | $15.00 | $75.00 | — |
| gemini-2.5-pro | gemini | $1.25 | $10.00 | gemini-2.5-flash |
| gemini-2.5-flash | gemini | $0.15 | $0.60 | — |
| gemini-1.5-pro | gemini | $1.25 | $5.00 | gemini-2.5-flash |
| gemini-1.5-flash | gemini | $0.075 | $0.30 | — |
costUsd: 0, so caps won't fire on them. Open an issue on GitHub.Token counting across providers
Each provider uses its own tokenizer. The same prompt produces a different token count at OpenAI vs. Anthropic vs. Gemini — they can disagree by 10–25% on the same text. Costs still compare correctly across providers (price-per-token × tokens, in USD) but raw token counts do not.
- Compare across providers:
cost_usd,margin_usd,request_count, latency. - Don't compare across providers:
input_tokens,output_tokens. Slice the dashboard by provider first.
PII and user identifiers
The SDK never sends prompt text or LLM responses. But you control user_id, feature, and plan. Common mistakes:
# DON'T — user_id looks like PII
wk.chat(openai_client, {
"user_id": "jane@acme.com", # server rejects with 400
"feature": "ai-summary-jane@acme.com-resume", # same
...
})
# DO — opaque identifier you control
wk.chat(openai_client, {
"user_id": "u_42_abc", # your auth user id
"feature": "ai-summary", # static label
...
})The ingest endpoint rejects values containing emails or credit-card-shape digit runs with a clear 400. If you need to attribute a row to an email, pass a hash:
import hashlib
user_id_hash = hashlib.sha256(user.email.lower().encode()).hexdigest()[:16]
wk.chat(openai_client, {..., "user_id": user_id_hash, "plan": user.plan})FAQ
The log POST is fire-and-forget on a daemon thread after your LLM call returns — it doesn't wait. The cap check before the call hits a 60-second in-memory cache, so at most one extra round-trip per (user_id, plan, model) per minute. Typical hot path: 0 ms added.
5xx, 429, or network errors on the cap-check endpoint fail open — the LLM call proceeds. 401/403 fail closed with a WeckrConfigErrorso a misconfig doesn't silently disable cap enforcement. Log POSTs are always best-effort; failures go to on_error if you provided one.
No. The SDK only sends metadata: model, provider, token counts, latency, your user_id / feature / planlabels. The prompt text and the model's reply never leave your process.
No. Long-running servers like FastAPI, Flask, Django, etc. don't need flush — the daemon thread completes well before the process exits. Only AWS Lambda, cron jobs, and CLI scripts need wk.flush() before exit.
wk.chat()is synchronous today and works inside any async framework as a plain function call (the LLM call itself blocks the event loop if you use a sync client). If you need real async support, the async SDK is on the roadmap — open an issue if it's blocking you.
From the model's public per-token pricing (see the table above) multiplied by the input/output token counts reported by the provider. Cost is recomputed server-side so a client can't forge a fake value. Margin is plan_revenue_usd - cost_usd at read time.
The repo at github.com/Ghiles3232/weckr-sdks is open source: a Next.js app + Supabase migrations. Construct Weckr(endpoint="...", check_endpoint="...") pointing at your own deployment.
Stuck?
File an issue on GitHub.