How to Add Fallback Logic When an AI Agent Returns Low Confidence

7 min read

You add fallback logic by reading a confidence signal from the model, comparing it against a threshold, and routing anything below that line to a safer path: a retry with a sharper prompt, a stronger model, or a human. The threshold is the easy part. The hard part is picking a confidence signal you can trust, because the number a model gives you when you ask it "how confident are you, 0 to 1?" is mostly theatre. Real confidence comes from token log probabilities (logprobs, the model’s own probability for each word it picked) or from a second cheap model acting as a judge.

Key takeaways

Self-reported confidence ("rate yourself 0 to 1") is unreliable. Use token logprobs or a judge model instead.
Build three fallback paths, not one: retry with a better prompt, escalate to a stronger model, then hand off to a human.
Set the threshold by counting your own labelled examples, not by guessing a round number like 0.8.
Always cap retries. A fallback that loops forever is just a slower way to burn your API budget.
Log every low-confidence decision. The human review queue is your best source of future training and prompt fixes.

What "low confidence" actually means

There are two ways to get a confidence number out of a language model, and only one of them is honest. The first is to ask the model to score itself. The second is to read the probabilities the model assigned to the tokens it actually produced. Asking the model to grade its own work feels intuitive, and it is the approach most tutorials reach for. It is also the one that quietly fails, because a model that hallucinated a wrong answer will hallucinate a high confidence score to match.

The honest signal is logprobs. When you pass logprobs: true to the OpenAI Chat Completions API, you get back the log probability of each output token. Convert those to linear probabilities, average the ones that make up the answer, and you have a confidence figure grounded in what the model actually computed, not what it claims about itself.

Dashboard showing classification confidence scores plotted against accuracy

The three fallback paths

One fallback is not enough. A binary "confident or human" split sends too much to people and wastes the cheap recovery options in between. Build a ladder instead, cheapest rung first.

Retry with a sharper prompt. Re-ask with the failing input pinned, the schema repeated, and one example. Many low-confidence cases are prompt ambiguity, not hard problems.
Escalate to a stronger model. If a small model is unsure, send the same input to a bigger one. You pay more per call, but only on the fraction that needs it.
Hand off to a human. If it is still below the floor, queue it for a person with the input, the model’s guess, and the confidence attached so the review takes seconds, not minutes.

Wire it up in n8n

Here is the core of it as an n8n Code node. The OpenAI node runs first with logprobs enabled, then this node turns the token probabilities into one confidence number and picks a route. The Switch node downstream reads route and sends the item to the right branch.

javascript

// n8n Code node: turn logprobs into a confidence-based route
const { label, logprobs } = items[0].json;

// Average the linear probability of the answer tokens (each 0..1)
const probs = logprobs.content.map((t) => Math.exp(t.logprob));
const confidence = probs.reduce((a, b) => a + b, 0) / probs.length;

const HIGH = 0.85; // proceed automatically
const LOW = 0.6;   // straight to a human

let route;
if (confidence >= HIGH) route = 'auto';
else if (confidence >= LOW) route = 'retry_stronger_model';
else route = 'human_review';

return [{ json: { label, confidence: Number(confidence.toFixed(3)), route } }];

If you are running a local model where logprobs are awkward to get, use a judge instead: send the input and the agent’s answer to a second, cheap model and ask it one yes/no question, "is this answer fully supported by the input?". A no routes to the same human queue. It costs one extra call but works anywhere.

A confidence threshold that cost real money

March 2025, a Tuesday afternoon, a 7-person insurtech I was helping ran a claims-triage agent on GPT-4o-mini, about 1,100 claims a day. The agent auto-approved anything it labelled "routine" with self-rated confidence above 0.8. Week one, it approved a 4,000 dollar claim it had labelled routine at 0.92 that was anything but. The model sounded sure. It was wrong. We pulled the self-rated score out entirely, switched to averaged logprobs, and added the human rung for anything under 0.6. The next month, zero wrong auto-approvals, and the human queue settled at about 40 claims a day, roughly twenty minutes of one person’s time. The lesson landed hard: the model’s confidence in its own answer told us nothing the answer itself did not already pretend.

The signal change is cheap to test. OpenAI exposes logprobs on the Chat Completions API (OpenAI API reference, 2024) at no extra token cost, and GPT-4o-mini runs at 0.15 dollars per million input tokens and 0.60 dollars per million output tokens (OpenAI 2024 pricing page), so the judge-model variant adds a fraction of a cent per decision. The expensive thing is not the extra call. The expensive thing is the wrong auto-approval you did not catch.

Pick your threshold by counting, not guessing

My opinion, stated plainly: a threshold you picked because 0.8 sounds reasonable is a guess wearing a lab coat. Take 200 labelled examples, run them through the agent, record the confidence for each, and look at where accuracy falls off a cliff. That elbow is your threshold. For one classifier it was 0.72, for another 0.55. The right number is a property of your data and your model, not a default you read in a blog post. Where this breaks down: if you do not have 200 labelled examples yet, start conservative (route more to humans) and tighten as the review queue gives you the labels you were missing.

A simple confidence ladder you can tune to your own data
Confidence band	Action	Why
Above your high line	Proceed automatically	Accuracy is high enough to trust unattended
Between the lines	Retry, then escalate model	Recoverable with a cheap second attempt
Below your low line	Human review queue	Not worth the risk of acting unattended

Fallback logic is not about making the agent smarter. It is about making the agent honest about when it does not know, and giving that moment somewhere safe to go. If you want the deeper version of this, see how a self-correcting agent caps its own retries and how conditional-logic patterns route agentic work.

“The agent that says "I am not sure, here is my best guess and a 41% confidence" is worth ten that say "approved" with a straight face.”

Frequently asked questions

Can I just ask the LLM to rate its own confidence?

You can, but do not trust it for routing. Self-rated confidence tracks how assertive the answer sounds, not how correct it is. Use token logprobs or a separate judge model for any decision that has real consequences.

What confidence threshold should I start with?

Do not start with a number, start with data. Run 200 labelled examples, plot accuracy against confidence, and put the threshold where accuracy drops. If you have no labelled data yet, route more to humans and tighten as the review queue produces labels.

How do I get confidence from a local model with no logprobs?

Use a judge: send the input and the agent’s answer to a second cheap model and ask one yes/no question about whether the answer is supported. A no routes to your fallback path. It adds one call but works on any model.

Should low-confidence cases always go to a human?

Not first. Try a cheap retry with a sharper prompt, then a stronger model, and only hand off to a human if it is still below your floor. That keeps the human queue small enough that people actually work it.

How do I stop the fallback from looping forever?

Cap the retries, usually at two. Store an attempt counter on the item and force the human path once it is exceeded. A fallback with no cap is just a slower, more expensive way to fail.