Documentation

SDK:TypeScript Python

Weckr is AI cost & margin intelligence for SaaS founders. Wrap any LLM client, get per-user and per-feature cost data, set spending caps that the SDK enforces before the LLM call, and receive recommendations on pricing & cheaper models.

This page covers the Python SDK weckr-sdk@0.1.3 on PyPI. TypeScript users see /docs. The two SDKs are wire-compatible — same backend, same dashboard, same caps.

Quick start

Drop the SDK in front of your existing LLM client. The call returns unchanged; Weckr logs cost + margin asynchronously and enforces caps before each call.

bash

pip install weckr-sdk openai

python

from openai import OpenAI
from weckr import Weckr

openai_client = OpenAI()                       # reads OPENAI_API_KEY
wk = Weckr(
    api_key="wk_...",                          # from app.useweckr.com signup
    plans={"free": 0, "pro": 29},              # your plan prices in USD
)

result = wk.chat(
    openai_client,
    {
        "model": "gpt-4o-mini",
        "messages": [{"role": "user", "content": "Summarize this doc."}],
        "user_id": user.id,                    # your app's user id
        "feature": "doc-summary",              # anything you'd group cost by
        "plan": user.plan,                     # matches one of plans above
    },
)
print(result.choices[0].message.content)

That's the whole integration. Your existing call works exactly as before; the dashboard at app.useweckr.com fills with data on the next refresh.

Note the call shape: wk.chat(client, params_dict). Per-call metadata (user_id, feature, plan) live INSIDE the params dict alongside model / messages— same shape as the TypeScript SDK.

Install & configure

1. Sign up & create a project

Sign up at app.useweckr.com/auth/signup. Right after, you'll create a project and we'll show you its wk_ key once. Copy it — the dashboard masks it from then on.

2. Install the SDK

Zero runtime dependencies. Bring your own LLM SDK:

bash

pip install weckr-sdk openai            # OpenAI
pip install weckr-sdk anthropic         # Anthropic
pip install weckr-sdk google-genai      # Gemini (new SDK)

# or all at once:
pip install "weckr-sdk[all]"

Python 3.9+. Uses urllib internally so the SDK itself has no runtime deps.

3. Initialise once at boot

python

# app/weckr.py
import os
from weckr import Weckr

wk = Weckr(
    api_key=os.environ["WECKR_API_KEY"],
    plans={
        "free":     0,
        "starter":  9,
        "pro":      29,
        "business": 99,
    },
    # optional:
    # on_error=lambda e: print("Weckr async error:", e),
    # on_downgrade=lambda info: analytics.track("ai_downgrade", info),
)

The plansmap is how Weckr knows what each user pays you. It's what powers the margin column on the dashboard and the recommendation engine.

Logging LLM calls

Wrap your existing client — OpenAI, Anthropic, or Gemini — via wk.chat(client, params). The SDK detects the provider, makes the call, returns the original result, and after the call resolves, fires a fire-and-forget log on a daemon thread.

OpenAI

python

from openai import OpenAI
openai_client = OpenAI()

result = wk.chat(openai_client, {
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": prompt}],
    "user_id": user.id,
    "feature": "ai-summary",
    "plan": user.plan,
})

Anthropic

python

from anthropic import Anthropic
anthropic = Anthropic()

result = wk.chat(anthropic, {
    "model": "claude-sonnet-4",
    "max_tokens": 1024,
    "messages": [{"role": "user", "content": prompt}],
    "user_id": user.id,
    "feature": "ai-essay",
    "plan": user.plan,
})

Gemini

python

# New SDK (recommended)
from google import genai
gemini = genai.Client()

result = wk.chat(gemini, {
    "model": "gemini-2.5-flash",
    "messages": [
        {"role": "system",    "content": "You are a translator."},  # mapped to system_instruction
        {"role": "user",      "content": prompt},
    ],
    "user_id": user.id,
    "feature": "ai-translate",
    "plan": user.plan,
})

The legacy google-generativeai SDK also works — the Python SDK detects either client shape. Message roles are preserved as multi-turn contents; system messages are mapped to Gemini's system_instruction.

Fields you should pass

user_id — the end-user of your app, stable across requests. Use your auth user id. Required for cap checks and per-user margin. Omit (or pass None) for anonymous calls — the row still lands; cap checks are simply skipped.
feature — a label like 'ai-summary'. Groups requests on the Features page and powers model recommendations.
plan — matches one of the keys in your plans config. Passing a value not in plans raises WeckrConfigError at call time.

Anonymous calls

For calls that don't belong to a logged-in user (marketing tools, anonymous demos, internal scripts), omit user_id, feature, and plan. The row lands with NULLin those columns and shows up grouped under “None” on the Users page.

python

wk.chat(openai_client, {
    "model": "gpt-4o-mini",
    "messages": [{"role": "user", "content": prompt}],
})

Streaming

If the LLM client returns a streaming response, the SDK detects it after the call and emits an on_errorwarning — token usage isn't available on streams by default and the row will log 0 tokens. Don't pass streaming params to wk.chat() today. Wrapped streaming with full usage capture lands in a future release.

Spending caps

Set a max monthly AI spend per user per plan. The SDK checks the cap before every LLM call (with a 60-second in-memory cache per (user_id, plan_name, model), so at most one extra request per minute per cache key).

1. Configure in the dashboard

On /dashboard/settings set a monthly cap (USD) for each plan and pick the action:

Block call — the SDK raises WeckrCapError before reaching the LLM. You catch it and show your user an upgrade prompt.
Downgrade model— the SDK silently swaps to a cheaper model in the same provider (e.g. gpt-4o → gpt-4o-mini). The call still completes and logs against the new model.

2. Handle the block case

python

from weckr import WeckrCapError, is_weckr_cap_error

try:
    result = wk.chat(openai_client, params)
    return result
except WeckrCapError as err:
    return JSONResponse(
        status_code=402,
        content={
            "error":    "AI spend cap reached for this user",
            "cap":      err.cap,
            "spent":    err.current_spend,
            "plan":     err.plan_name,
        },
    )

Downgrade is transparent

When the SDK downgrades, it emits a one-time WeckrDowngradeWarning per (user_id, model) pair via Python's warnings module. No raise. The log row records the actually-used model.

Override with an on_downgrade callback for analytics:

python

wk = Weckr(
    api_key=os.environ["WECKR_API_KEY"],
    plans={...},
    on_downgrade=lambda info: analytics.track("ai_downgrade", info),
    # info = {"userId": str, "from": str, "to": str}
)

WeckrDowngradeWarning is its own subclass of UserWarning, so apps that run warnings.filterwarnings("error") don't accidentally turn a downgrade into an exception. To silence: warnings.filterwarnings("ignore", category=WeckrDowngradeWarning).

Fail-open / fail-closed

5xx, 429, network error, timeout— fail OPEN (LLM call proceeds). The error goes to on_error if provided.
401, 403 — fail CLOSED with a WeckrConfigError. A typo'd or revoked api key raises synchronously so you catch the bug at boot, not silently in production. See Error handling.

You can opt out of cap checks entirely with disable_cap_check=True in the Weckr(...) constructor.

Real-time alerts (agent loop detection)

Every successful log POST triggers a fire-and-forget velocity check on the same (project_id, user_id) over the last few minutes. If cumulative tokens cross your threshold, Weckr writes an alert_log row and dispatches whatever channels you have configured. Designed for runaway CrewAI / LangChain reasoning loops, but works for any user-attributed call.

No code change required

The detection runs server-side after each ingest. You don't add any code in your app — if you're already calling wk.chat() with a user_id, you already have loop detection.

Defaults

Window: 5 minutes — how far back we sum tokens.
Threshold: 50,000 tokens — if a single user crosses this in the window, the alert fires.
Cooldown: 30 minutes — we suppress duplicate alerts for the same (project, user) so an actively-looping agent doesn't spam Slack.

Tune in the dashboard

All three knobs live in /dashboard/settings under Agent loop detection. Lower the threshold (e.g. 10k tokens / 2 min) for agent products where a single runaway crew is expensive; raise it for chat products with naturally heavy users.

The CrewAI guide shows the pattern in practice. Loop detection is the single biggest reason agent founders adopt Weckr.

Notifications: Slack & email

When loop detection trips (or any future alert type), Weckr can ping Slack and email. Both are configured per-project on /dashboard/settings and stored in the project's alert_config jsonb.

Slack

Paste an incoming webhook URL. We POST a red-attachment payload with the user_id, tokens-in-window, cost so far, request count, and a link back to the dashboard. No Slack app install required.

Email

Toggle email alerts on, enter the destination address. Emails ship via Resend from the Weckr-verified sender domain — no SMTP setup on your end.

Test alert button

Settings has a Test alert button next to the channel inputs. It POSTs /api/v1/projects/[id]/test-alert which dispatches a fake velocity alert through every channel you've configured. Lets you confirm Slack + email work before a real anomaly trips them.

The cooldown applies to real alerts only — the test button bypasses it so you can re-test without waiting.

Query from Claude (MCP server)

@weckr/mcp is a Model Context Protocol server. Plug it into Claude Desktop, Cursor, or any MCP-compatible client and ask your cost data questions in plain language. It is a Node binary you launch via npx— nothing else to install even on Python projects.

Install

json

// Claude Desktop config (claude_desktop_config.json)
{
  "mcpServers": {
    "weckr": {
      "command": "npx",
      "args": ["-y", "@weckr/mcp"],
      "env": {
        "WECKR_API_KEY": "wk_..."
      }
    }
  }
}

Cursor uses the same JSON in .cursor/mcp.json. Restart your client and the Weckr tools appear.

Tools exposed

get_overview— total cost, revenue, margin, request count, unprofitable user count for the month.
get_users— per-user margin breakdown, filterable.
get_feature_breakdown— per-feature cost share.
get_model_recommendations— same-provider cheaper-model swaps with $ saving estimates.
get_pricing_recommendations— per-plan margin health + recommended price.
get_spending_cap_url— returns the dashboard URL where caps are edited (does NOT mutate).

Sample prompts that work well: “Which users are unprofitable this month?” · “Where can I cut AI cost?” · “Is my Pro pricing sustainable?” Source code lives at github.com/Ghiles3232/weckr-sdks/tree/main/mcp.

Error handling

The SDK raises two kinds of errors. Tell them apart with the type-guard helpers.

python

from weckr import (
    WeckrCapError,
    WeckrConfigError,
    is_weckr_cap_error,
    is_weckr_config_error,
)

try:
    result = wk.chat(openai_client, params)
except WeckrCapError as err:
    # User hit their monthly cap. Show an upgrade prompt.
    # err.user_id / err.plan_name / err.cap / err.current_spend are populated.
    return show_upgrade_prompt(err)
except WeckrConfigError as err:
    # CRITICAL — Weckr is misconfigured. Send to your backend alerts.
    # err.code is one of: 'invalid_api_key' | 'forbidden' | 'unknown_plan'
    return alert_backend(err.code, str(err))
# any other exception is from the LLM provider itself

WeckrCapError — cap reached. User-visible: show your upgrade UI.
WeckrConfigError — developer-only. Three codes:
- invalid_api_key: 401 from /check. Your wk_ key is typo'd, revoked, or wasn't copied correctly.
- forbidden: 403 from /check. Usually means the project was deleted.
- unknown_plan: you passed a plan to wk.chat() that isn't in the plansdict. Fail-fast so a typo doesn't silently log $0-revenue rows.

on_error receives async errors (log POST failures, cap-check 5xx). It does NOT receive WeckrCapError or WeckrConfigError — those are synchronous raises. The callback runs on a background thread; make it thread-safe.

Short-lived processes (Lambda, cron, CLI)

wk.chat()returns as soon as the LLM call resolves; the log POST runs on a daemon thread after that. In a long-running web server this is perfect — it adds zero hot-path latency. In a short-lived process (Lambda handler, cron job, CLI), the host can exit before the POST hits the network and you lose the last log row.

Call wk.flush() before exit. It joins every in-flight POST thread (default 5s timeout) and returns.

python

# Lambda handler
def handler(event, context):
    result = wk.chat(openai_client, {
        "model": "gpt-4o-mini",
        "messages": [...],
        "user_id": event["user_id"],
        "plan": "pro",
    })
    wk.flush()              # wait for the log POST before Lambda freezes the runtime
    return result

Long-running servers (FastAPI, Flask, Django, etc.) don't need this — the daemon thread completes long before the process ever exits.

The dashboard

All pages live under app.useweckr.com/dashboard. Use the top-right project selector to switch between projects (or create a new one). The dashboard is the same whether you log from TypeScript or Python — the backend doesn't care which SDK wrote the row.

Overview— 4 stat cards (cost, revenue, margin, unprofitable users), the cost-vs-revenue line chart for the last 30 days, an alert banner when any user is losing you money, plus pricing intel cards.
Users — one row per user_id, sorted by margin (worst first). Anonymous rows (no user_id) appear grouped under “None”.
Features — one row per feature, sorted by total spend.
Recommendations— model swaps with projected savings.
Settings— spending caps per plan.

Margin is derived in the read RPCs as SUM(plan_revenue_usd) - SUM(cost_usd)at full precision — users whose calls cost sub-cent on $0-revenue plans are still detected as unprofitable.

SDK reference

Weckr(...)

python

class Weckr:
    def __init__(
        self,
        api_key: str,                                  # required, starts with 'wk_'
        plans: Optional[Dict[str, float]] = None,      # plan -> monthly price USD
        endpoint: str = "https://app.useweckr.com/api/v1/log",
        check_endpoint: str = "https://app.useweckr.com/api/v1/check",
        disable_cap_check: bool = False,
        on_error: Optional[Callable[[BaseException], None]] = None,
        on_downgrade: Optional[Callable[[Dict[str, str]], None]] = None,
    ) -> None: ...

wk.chat(client, params)

python

# params dict keys:
{
    "model":    str,                       # e.g. "gpt-4o" or "gpt-4o-2024-08-06"
    "messages": list[dict],
    "user_id":  Optional[str],             # your app's user id (omit for anonymous)
    "feature":  Optional[str],
    "plan":     Optional[str],             # MUST match a key in plans dict (or omit)
    # ...any other provider-specific kwargs are forwarded to the underlying client
}

result = wk.chat(openai_client, params)    # returns the LLM result unchanged

wk.flush(timeout_seconds=5.0)

python

wk.flush()              # wait up to 5s for in-flight log POSTs
wk.flush(timeout_seconds=10.0)

Long-running servers don't need this; short-lived processes do. See Short-lived processes.

WeckrCapError

python

class WeckrCapError(Exception):
    name: str  # 'WeckrCapError'
    user_id: Optional[str]
    plan_name: Optional[str]
    current_spend: Optional[float]
    cap: Optional[float]

from weckr import is_weckr_cap_error
if is_weckr_cap_error(err): ...   # show upgrade prompt

WeckrConfigError

python

class WeckrConfigError(Exception):
    name: str   # 'WeckrConfigError'
    code: str   # 'invalid_api_key' | 'forbidden' | 'unknown_plan'

from weckr import is_weckr_config_error
if is_weckr_config_error(err): ...   # alert backend; misconfig

Pure-function exports

python

from weckr import PRICING, CHEAPER_ALTERNATIVE, calculate_cost, resolve_pricing

PRICING["gpt-4o"]                       # {'input': 2.5, 'output': 10.0}
resolve_pricing("gpt-4o-2024-08-06")    # resolves to gpt-4o family pricing
calculate_cost("gpt-4o-mini", 12, 2)    # 3e-6

Dated variants (gpt-4o-2024-08-06, claude-3-5-sonnet-latest, etc.) resolve via longest-prefix match against PRICING. Unknown models return zero pricing rather than raising.

What gets logged

python

{
    "userId":         "u_42" or None,
    "feature":        "ai-summary" or None,
    "model":          "gpt-4o-mini",     # the actually-used model (downgraded if applicable)
    "provider":       "openai",           # "openai" | "anthropic" | "gemini"
    "inputTokens":    12,
    "outputTokens":   2,
    "costUsd":        0.000003,           # server-recalculated; client value ignored
    "latencyMs":      1218,
    "planName":       "pro" or None,
    "planRevenueUsd": 29.0 or None,
    "timestamp":      "2026-06-15T07:52:18.086515+00:00",
}

Logging is fire-and-forget — failures never raise (they go to on_error if provided). Cost is recomputed server-side from (model, tokens). Margin is derived in the read RPCs at full precision.

HTTP API reference

The SDK is a thin wrapper around a small public API. You can integrate directly without the SDK if you prefer. Base URL: https://app.useweckr.com.

POST /api/v1/log

Records a single LLM call. Auth via x-api-key header. userId, feature, planName, planRevenueUsd may all be null or omitted (the row lands as NULL). model, provider, tokens, and latencyMs are required.

python

import urllib.request, json

req = urllib.request.Request(
    "https://app.useweckr.com/api/v1/log",
    data=json.dumps({
        "userId": "user_123",
        "feature": "summary",
        "model": "gpt-4o-mini",
        "provider": "openai",
        "inputTokens": 120,
        "outputTokens": 80,
        "costUsd": 0.0001,
        "latencyMs": 800,
        "planName": "pro",
        "planRevenueUsd": 29.0,
    }).encode(),
    method="POST",
    headers={"Content-Type": "application/json", "x-api-key": "wk_..."},
)
with urllib.request.urlopen(req) as r:
    print(json.loads(r.read()))  # {"ok": true}

Server-side guarantees: costUsd is recomputed from (model, tokens) — the value you send is discarded. If the model has known pricing AND planName is set, posting inputTokens=outputTokens=0 is rejected (400) — a cap-bypass guard.

GET /api/v1/check

Checks whether a given userId on a given planName is over their cap. Auth via x-api-key.

python

url = (
    "https://app.useweckr.com/api/v1/check"
    "?userId=user_123&planName=pro&model=gpt-4o"
)
req = urllib.request.Request(url, headers={"x-api-key": "wk_..."})
with urllib.request.urlopen(req) as r:
    decision = json.loads(r.read())
# -> {"allowed": True, "currentSpend": 4.20, "cap": 20, "remainingBudget": 15.80}
# OR when capped:
# -> {"allowed": False, "action": "downgrade",
#     "alternativeModel": "gpt-4o-mini",
#     "currentSpend": 21.00, "cap": 20}

Full dashboard endpoint reference is on the TypeScript docs page — they're language-agnostic.

Supported models

Per-million token prices used for cost calculation and recommendations. Dated variants resolve via longest-prefix match: gpt-4o-2024-08-06 resolves to gpt-4o; claude-3-5-sonnet-latest resolves to claude-3-5-sonnet.

Model	Provider	Input / 1M	Output / 1M	Cheaper alternative
gpt-4o	openai	$2.50	$10.00	gpt-4o-mini
gpt-4o-mini	openai	$0.15	$0.60	—
gpt-4-turbo	openai	$10.00	$30.00	gpt-4o-mini
gpt-4	openai	$30.00	$60.00	gpt-4o-mini
gpt-3.5-turbo	openai	$0.50	$1.50	—
o1-preview	openai	$15.00	$60.00	—
o1-mini	openai	$3.00	$12.00	—
claude-opus-4	anthropic	$15.00	$75.00	claude-sonnet-4
claude-sonnet-4	anthropic	$3.00	$15.00	claude-haiku-4-5
claude-haiku-4-5	anthropic	$0.80	$4.00	—
claude-3-5-sonnet	anthropic	$3.00	$15.00	—
claude-3-5-haiku	anthropic	$0.80	$4.00	—
claude-3-opus	anthropic	$15.00	$75.00	—
gemini-2.5-pro	gemini	$1.25	$10.00	gemini-2.5-flash
gemini-2.5-flash	gemini	$0.15	$0.60	—
gemini-1.5-pro	gemini	$1.25	$5.00	gemini-2.5-flash
gemini-1.5-flash	gemini	$0.075	$0.30	—

Don't see your model? Unknown models still log a row but with costUsd: 0, so caps won't fire on them. Open an issue on GitHub.

Token counting across providers

Each provider uses its own tokenizer. The same prompt produces a different token count at OpenAI vs. Anthropic vs. Gemini — they can disagree by 10–25% on the same text. Costs still compare correctly across providers (price-per-token × tokens, in USD) but raw token counts do not.

Compare across providers: cost_usd, margin_usd, request_count, latency.
Don't compare across providers: input_tokens, output_tokens. Slice the dashboard by provider first.

PII and user identifiers

The SDK never sends prompt text or LLM responses. But you control user_id, feature, and plan. Common mistakes:

python

# DON'T — user_id looks like PII
wk.chat(openai_client, {
    "user_id": "jane@acme.com",                       # server rejects with 400
    "feature": "ai-summary-jane@acme.com-resume",     # same
    ...
})

# DO — opaque identifier you control
wk.chat(openai_client, {
    "user_id": "u_42_abc",                             # your auth user id
    "feature": "ai-summary",                           # static label
    ...
})

The ingest endpoint rejects values containing emails or credit-card-shape digit runs with a clear 400. If you need to attribute a row to an email, pass a hash:

python

import hashlib

user_id_hash = hashlib.sha256(user.email.lower().encode()).hexdigest()[:16]

wk.chat(openai_client, {..., "user_id": user_id_hash, "plan": user.plan})

FAQ

Does Weckr add latency to my LLM calls?

The log POST is fire-and-forget on a daemon thread after your LLM call returns — it doesn't wait. The cap check before the call hits a 60-second in-memory cache, so at most one extra round-trip per (user_id, plan, model) per minute. Typical hot path: 0 ms added.

What happens if your API is down?

5xx, 429, or network errors on the cap-check endpoint fail open — the LLM call proceeds. 401/403 fail closed with a WeckrConfigErrorso a misconfig doesn't silently disable cap enforcement. Log POSTs are always best-effort; failures go to on_error if you provided one.

Does the SDK send my prompts or completions to Weckr?

No. The SDK only sends metadata: model, provider, token counts, latency, your user_id / feature / planlabels. The prompt text and the model's reply never leave your process.

Do I need to call wk.flush() in my web server?

No. Long-running servers like FastAPI, Flask, Django, etc. don't need flush — the daemon thread completes well before the process exits. Only AWS Lambda, cron jobs, and CLI scripts need wk.flush() before exit.

Is the SDK async/await-compatible?

wk.chat()is synchronous today and works inside any async framework as a plain function call (the LLM call itself blocks the event loop if you use a sync client). If you need real async support, the async SDK is on the roadmap — open an issue if it's blocking you.

How does cost get calculated?

From the model's public per-token pricing (see the table above) multiplied by the input/output token counts reported by the provider. Cost is recomputed server-side so a client can't forge a fake value. Margin is plan_revenue_usd - cost_usd at read time.

Can I self-host?

The repo at github.com/Ghiles3232/weckr-sdks is open source: a Next.js app + Supabase migrations. Construct Weckr(endpoint="...", check_endpoint="...") pointing at your own deployment.

Stuck?

File an issue on GitHub.