How-to · Agent loops

How to Detect and Stop AI Agent Reasoning Loops

An AI agent that loops doesn’t crash. It doesn’t throw an error. It just keeps thinking, keeps calling the model, and keeps billing you, quietly, at machine speed.

The scary part is the math. A loop that fires 10,000 model calls at a penny each is $100, and it can happen in the time it takes to grab coffee. You find out when the invoice lands.

This guide covers what a reasoning loop actually is, why it’s so financially dangerous, how to detect one by hand with token velocity, and how Weckr catches it for you and pages you before the damage is done.

What is an AI agent reasoning loop

An agent is a model wrapped in a loop: think, call a tool, read the result, think again, repeat until the task is done. A reasoning loop is what happens when that “until done” condition never trips. The agent gets stuck in a cycle where each step looks like progress but the state never actually advances.

The classic version: a tool call fails, the model reads the error, decides to retry, gets the same error, and loops. Other flavors are two agents passing work back and forth, a self-critique step that always finds one more thing to fix, or malformed tool output the model keeps trying to re-parse. In every case the agent is behaving exactly as written. It simply has no budget and no exit, so it thinks forever.

Because there’s no exception, none of your normal alarms go off. Your error rate is flat. Your latency looks fine per call. The only thing moving is your token count, and it’s moving fast.

Why loops are financially dangerous

Regular AI cost problems are slow: a power user drifts over their plan margin across a month. A loop is not slow. It is a spike, and the numbers get ugly in minutes.

Run the math. A modest agent step is around $0.01 in tokens. A loop firing at machine speed racks these up quickly:

// A loop that never hits its stop condition
1,000 model calls  x  $0.01  =  $10      // a few minutes
10,000 model calls x  $0.01  =  $100     // maybe 20 minutes
100,000 model calls x $0.01  =  $1,000   // one afternoon

And $0.01 is generous-to-you. A single gpt-4o call with a large context window and a long completion runs well past that. At $2.50 per million input tokens and $10 per million output, a chatty agent re-sending its whole history on every turn compounds fast, because each loop iteration ships a bigger prompt than the last.

Now the failure mode that actually hurts: it fires at 2am, on one user, on a weekend. Nobody is watching the OpenAI dashboard. The agent loops for six hours. You find the four-figure line item on Monday, after the money is already gone. The whole point of loop detection is to compress that from “end of the billing cycle” to “a few minutes.”

How to detect a loop manually

The signal you want is token velocity: tokens consumed per unit of time, per user. A real person is rate-limited by how fast they type and read, so they drip a few thousand tokens across a session. A looping agent hammers the model at machine speed and blows past that in a minute or two. Watching velocity catches the loop while it’s happening, not after.

Here is the idea in TypeScript: keep a rolling 5-minute window of token usage per user, add each call, drop anything older than the window, and flag the user when the sum crosses a threshold.

type Hit = { at: number; tokens: number }
const WINDOW_MS = 5 * 60 * 1000   // 5 minutes
const THRESHOLD = 50_000          // tokens in the window
const seen = new Map<string, Hit[]>()

function checkVelocity(userId: string, tokens: number): boolean {
  const now = Date.now()
  const hits = (seen.get(userId) ?? []).filter((h) => now - h.at < WINDOW_MS)
  hits.push({ at: now, tokens })
  seen.set(userId, hits)

  const total = hits.reduce((sum, h) => sum + h.tokens, 0)
  return total > THRESHOLD   // true = probable loop
}

// after every model call:
const usage = response.usage.total_tokens
if (checkVelocity(user.id, usage)) {
  alertOncall(user.id, usage)   // page someone, or kill the agent
}

This works, and you should absolutely put a hard iteration cap inside each agent loop too. But the in-memory version has the usual problems: it dies on deploy, it doesn’t share state across instances, and you still have to build the alerting, the dedup, the cost calculation, and the dashboard. That is the part nobody wants to maintain.

How Weckr detects loops automatically

If you already send a userId with your calls, Weckr runs this check for you server-side. You wrap your client once and the loop detector rides along on every logged call. No extra code.

import { Weckr } from '@weckr/sdk'

const wk = new Weckr({
  apiKey: process.env.WECKR_API_KEY,
  plans: { free: 0, pro: 49 },
})

// Same call you already make. Loop detection is automatic
// because you passed a userId.
const response = await wk.chat(openai, {
  model: 'gpt-4o-mini',
  messages: agentMessages,
  userId: user.id,
  feature: 'research-agent',
  plan: user.plan,
})

The default threshold is 50,000 tokens in a rolling 5-minute window, per user. After every call Weckr logs, it recomputes that window server-side (so it survives your deploys and works across every instance you run) and, if a user crosses the line, it fires an alert. To keep one runaway agent from turning into a hundred pings, there is a 30-minute cooldown per user: once you’re alerted about a user, you won’t get pinged about that same user again for half an hour. You get told once, clearly, then given room to act. This works the same across OpenAI, Anthropic, and Gemini, since token counts are normalized before the check.

What happens when Weckr detects a loop

You get an alert on the channels you configured: a Slack incoming webhook, email (via Resend), or both. The message is built to be actionable at a glance. It tells you who, how much, and where to look:

⚠️  Weckr: possible agent loop detected

user_id:        usr_8f21c9
tokens (5 min): 61,400
cost so far:    $0.74
feature:        research-agent
model:          gpt-4o-mini

→ https://useweckr.com/dashboard/users/usr_8f21c9

That is the whole point. You see the user id, the tokens burned in the window, the cost so far, and a direct dashboard link, all before the number gets scary. The example above is 74 cents. Catching it here means the answer is “kill one agent” instead of “explain a $1,200 line item to your cofounder.” Follow the link and you can see exactly which feature and user is responsible.

How to configure loop detection thresholds

The defaults (50,000 tokens over 5 minutes, 30-minute cooldown) are a sane starting point, but the right numbers depend on your product. A heavy document-processing agent legitimately burns more than a chat assistant, so you will want to tune them.

You do this in the dashboard settings, per project, with no code change and no redeploy. Set the token threshold, the window length, and the cooldown, then pick which channels (Slack, email) get the alert. Open the demo to see where those settings live, or read the docs for the full list of knobs. Because it’s server-side config, you can tighten a threshold the moment you get burned, without shipping anything.

FAQ

How do I detect an AI agent infinite loop?

Watch token velocity per user, not just total spend. Track how many tokens a single user burns in a short rolling window (say 5 minutes) and alert when it crosses a threshold. A healthy user drifts along at a few thousand tokens; a looping agent spikes to tens of thousands in minutes. Weckr does this automatically for every logged call as long as you pass a userId.

What causes AI agents to loop infinitely?

Usually a planner that never reaches a stop condition: a tool call fails, the model re-reads the same error, decides to retry, and loops. Other common causes are two agents handing work back and forth, a self-critique step that always finds something to fix, or malformed tool output the model keeps trying to parse. The agent is not broken, it just has no budget and no exit, so it keeps thinking and keeps billing you.

How do I stop a runaway AI agent before it costs thousands?

Put a hard cap on the loop (max iterations or a token budget) so it cannot run forever, and add out-of-band monitoring that watches spend in real time. The cap saves you inside one request; the monitor catches the pattern across requests and pages you. With Weckr, loop detection fires server-side after each call and sends a Slack or email alert with the running cost so you can kill the agent in minutes, not at the end of the month.

What is token velocity and how is it used to detect agent loops?

Token velocity is tokens consumed per unit of time for a single user, for example tokens in the last 5 minutes. It is the fastest signal for a loop because a stuck agent keeps calling the model at machine speed while a real user is limited by how fast they type and read. Weckr sums each user's token usage over a rolling 5-minute window and treats crossing 50,000 tokens as a likely loop.

Can I get a Slack alert when my AI agent loops?

Yes. Weckr sends alerts to a Slack incoming webhook and to email (via Resend), configured per project in the dashboard settings. When a user crosses the velocity threshold you get a message with the user id, tokens in the window, cost so far, and a link to the dashboard. A 30-minute cooldown per user keeps a single runaway agent from flooding your channel.

Stop runaway agents before they cost you

A looping agent is the rare bug that gets more expensive the longer you don’t notice it, and by design it stays quiet the whole time. A hard iteration cap keeps any single request bounded; token-velocity monitoring catches the pattern across requests and pages you while it’s still cents, not thousands.

Weckr gives you the second half for free the moment you pass a userId: automatic detection, a clear Slack or email alert with the running cost, and thresholds you tune without touching code. See it running on seeded data, no signup needed, at useweckr.com/demo.

See the dashboard with real data, no signup needed.

Try the demo →