Integration guide

Using Weckr with CrewAI, catch runaway agent loops

CrewAI orchestrates multi-agent workflows. The hardest failure mode is an agent that gets stuck in a tool-call loop and burns through your monthly LLM budget in minutes. Weckr's job is to catch that before it costs you four figures.

Honesty up front. Native CrewAI callback integration is on the roadmap. Today the cleanest pattern is to wrap the underlying OpenAI client with wk.chat()at the boundary, then hand the wrapper to your agents. The example below shows exactly how, and it works with the SDK that's on PyPI right now.

Why this matters for CrewAI

A traditional API endpoint makes one LLM call per request. If it costs $0.01, the worst-case blast radius is bounded. A CrewAI agent loop is different: an agent can call its tools dozens of times for a single user request, and if the tool keeps returning ambiguous results the agent may retry until it hits a recursion limit (or your bank account does).

The mitigations are the same as for any LLM call (per-user caps, model recommendations), plus one CrewAI-specific addition: token-velocity alerts. If a single user's agent burns through tokens unusually fast, you want to know in minutes, not at the end of the month when the invoice arrives.

The pattern

CrewAI ultimately uses an OpenAI-shaped client (whether you point it at OpenAI, Anthropic via a proxy, or a local model with an OpenAI-compatible endpoint). The recipe:

Instantiate one openai.OpenAI() client at process start.
Instantiate one Weckr client at process start.
In the LLM-call site that your CrewAI agents reach (a tool, a custom LLM adapter, or a wrapper function), call wk.chat(openai_client, {...}) with the current user_id and plan.
Configure caps and alert thresholds at app.useweckr.com/dashboard/settings.

Install

bash

pip install weckr-sdk openai crewai

Get a wk_ API key from app.useweckr.com/auth/signup.

Working example

A CrewAI agent that uses a tool. The tool routes its LLM call through wk.chat(), so every round trip the agent triggers is attributed to the correct user and counted against their cap.

python

import openai
from crewai import Agent, Task, Crew
from crewai.tools import tool
from weckr import Weckr, is_weckr_cap_error

# 1. Shared clients.
openai_client = openai.OpenAI()
wk = Weckr(
    api_key="wk_...",
    plans={"free": 0, "pro": 29, "business": 99},
)

# 2. A context object so tools know which user is driving the agent.
class RequestContext:
    user_id: str = ""
    plan: str = ""

ctx = RequestContext()

# 3. A tool that uses wk.chat() for the actual LLM round trip.
@tool("Classify intent")
def classify_intent(text: str) -> str:
    """Classify the user's text into one of: question, complaint, feature_request."""
    result = wk.chat(openai_client, {
        "model": "gpt-4o-mini",
        "messages": [
            {"role": "system", "content": "Reply with exactly one of: question, complaint, feature_request."},
            {"role": "user", "content": text},
        ],
        "user_id": ctx.user_id,         # attribution travels with the request
        "feature": "support-classify",
        "plan": ctx.plan,
    })
    return result.choices[0].message.content

# 4. A normal CrewAI agent that uses the tool. CrewAI handles orchestration;
#    Weckr sees every model call the tool makes.
classifier_agent = Agent(
    role="Support classifier",
    goal="Read a support ticket and classify its intent.",
    backstory="You are a triage assistant.",
    tools=[classify_intent],
)

task = Task(
    description="Classify the following ticket: {ticket}",
    expected_output="One word: question, complaint, or feature_request.",
    agent=classifier_agent,
)

crew = Crew(agents=[classifier_agent], tasks=[task])

# 5. Per-request entry point. Set the context, kick off the crew, flush logs.
def handle_ticket(ticket: str, *, user_id: str, plan: str) -> str:
    ctx.user_id = user_id
    ctx.plan = plan
    try:
        return crew.kickoff(inputs={"ticket": ticket})
    except Exception as err:
        if is_weckr_cap_error(err):
            return f"AI spend cap reached for {err.plan_name}. Please upgrade."
        raise
    finally:
        wk.flush()                       # drain in-flight log POSTs

Note: RequestContext is a simple holder so the @tool closure can see the current user. In a real web app, store the per-request user_id/plan in a contextvars.ContextVarinstead so concurrent requests don't clobber each other.

Loop detection

Weckr monitors token velocity in real time. If your CrewAI agent exceeds 50,000 tokens in 5 minutes for a single user, you get an immediate Slack and email alert so you can kill the run before it burns through your budget. Configure thresholds (or disable the alert) at app.useweckr.com/dashboard/settings.

Combine velocity alerts with per-user monthly caps for defense in depth: the cap stops the bleed for a given user, the velocity alert tells you it's happening while it's happening. If you set the cap action to downgrade, the agent silently switches to the cheaper model in the same provider once the cap is hit, so the workflow still completes (just on a smaller model).

See Python docs · Spending caps for the cap mechanics and /dashboard/settings to configure the thresholds for your project.

Roadmap

Native CrewAI callback integration is planned. It will plug in at the crew level and auto-attribute every step's LLM call, including tool calls and agent-to-agent handoffs, without you wiring a context object. Subscribe on the roadmap page to be notified when it ships.

Until then, the boundary-wrap pattern above is the supported path. It uses only methods that exist on the published SDK: wk.chat(), wk.flush(), and wk.last_event_id.

See the LangChain guide at /docs/integrations/langchain for the same pattern in a chain-based workflow.