Integration guide
Using Weckr with CrewAI, catch runaway agent loops
CrewAI orchestrates multi-agent workflows. The hardest failure mode is an agent that gets stuck in a tool-call loop and burns through your monthly LLM budget in minutes. Weckr's job is to catch that before it costs you four figures.
wk.chat()at the boundary, then hand the wrapper to your agents. The example below shows exactly how, and it works with the SDK that's on PyPI right now.Why this matters for CrewAI
A traditional API endpoint makes one LLM call per request. If it costs $0.01, the worst-case blast radius is bounded. A CrewAI agent loop is different: an agent can call its tools dozens of times for a single user request, and if the tool keeps returning ambiguous results the agent may retry until it hits a recursion limit (or your bank account does).
The mitigations are the same as for any LLM call (per-user caps, model recommendations), plus one CrewAI-specific addition: token-velocity alerts. If a single user's agent burns through tokens unusually fast, you want to know in minutes, not at the end of the month when the invoice arrives.
The pattern
CrewAI ultimately uses an OpenAI-shaped client (whether you point it at OpenAI, Anthropic via a proxy, or a local model with an OpenAI-compatible endpoint). The recipe:
- Instantiate one
openai.OpenAI()client at process start. - Instantiate one
Weckrclient at process start. - In the LLM-call site that your CrewAI agents reach (a tool, a custom LLM adapter, or a wrapper function), call
wk.chat(openai_client, {...})with the currentuser_idandplan. - Configure caps and alert thresholds at app.useweckr.com/dashboard/settings.
Install
pip install weckr-sdk openai crewaiGet a wk_ API key from app.useweckr.com/auth/signup.
Working example
A CrewAI agent that uses a tool. The tool routes its LLM call through wk.chat(), so every round trip the agent triggers is attributed to the correct user and counted against their cap.
import openai
from crewai import Agent, Task, Crew
from crewai.tools import tool
from weckr import Weckr, is_weckr_cap_error
# 1. Shared clients.
openai_client = openai.OpenAI()
wk = Weckr(
api_key="wk_...",
plans={"free": 0, "pro": 29, "business": 99},
)
# 2. A context object so tools know which user is driving the agent.
class RequestContext:
user_id: str = ""
plan: str = ""
ctx = RequestContext()
# 3. A tool that uses wk.chat() for the actual LLM round trip.
@tool("Classify intent")
def classify_intent(text: str) -> str:
"""Classify the user's text into one of: question, complaint, feature_request."""
result = wk.chat(openai_client, {
"model": "gpt-4o-mini",
"messages": [
{"role": "system", "content": "Reply with exactly one of: question, complaint, feature_request."},
{"role": "user", "content": text},
],
"user_id": ctx.user_id, # attribution travels with the request
"feature": "support-classify",
"plan": ctx.plan,
})
return result.choices[0].message.content
# 4. A normal CrewAI agent that uses the tool. CrewAI handles orchestration;
# Weckr sees every model call the tool makes.
classifier_agent = Agent(
role="Support classifier",
goal="Read a support ticket and classify its intent.",
backstory="You are a triage assistant.",
tools=[classify_intent],
)
task = Task(
description="Classify the following ticket: {ticket}",
expected_output="One word: question, complaint, or feature_request.",
agent=classifier_agent,
)
crew = Crew(agents=[classifier_agent], tasks=[task])
# 5. Per-request entry point. Set the context, kick off the crew, flush logs.
def handle_ticket(ticket: str, *, user_id: str, plan: str) -> str:
ctx.user_id = user_id
ctx.plan = plan
try:
return crew.kickoff(inputs={"ticket": ticket})
except Exception as err:
if is_weckr_cap_error(err):
return f"AI spend cap reached for {err.plan_name}. Please upgrade."
raise
finally:
wk.flush() # drain in-flight log POSTsNote: RequestContext is a simple holder so the @tool closure can see the current user. In a real web app, store the per-request user_id/plan in a contextvars.ContextVarinstead so concurrent requests don't clobber each other.
Loop detection
Weckr monitors token velocity in real time. If your CrewAI agent exceeds 50,000 tokens in 5 minutes for a single user, you get an immediate Slack and email alert so you can kill the run before it burns through your budget. Configure thresholds (or disable the alert) at app.useweckr.com/dashboard/settings.
Combine velocity alerts with per-user monthly caps for defense in depth: the cap stops the bleed for a given user, the velocity alert tells you it's happening while it's happening. If you set the cap action to downgrade, the agent silently switches to the cheaper model in the same provider once the cap is hit, so the workflow still completes (just on a smaller model).
See Python docs · Spending caps for the cap mechanics and /dashboard/settings to configure the thresholds for your project.
Roadmap
Until then, the boundary-wrap pattern above is the supported path. It uses only methods that exist on the published SDK: wk.chat(), wk.flush(), and wk.last_event_id.
See the LangChain guide at /docs/integrations/langchain for the same pattern in a chain-based workflow.