Conditional Logic in Workflows: 5 AI Platforms Tested

Q: How do I implement conditional logic in an AI-driven workflow?

Call an LLM with a strict JSON system prompt that returns { category, confidence } , parse the JSON, then feed the structured field into a Switch or IF node. Never branch on raw model prose. Add a confidence threshold (0.7-0.85) and route anything below it to a fallback or human review.

8 min read · Updated Mar 30, 2026

AI conditional logic is the replacement for if/else trees that broke every time a customer phrased a request differently. The pattern is simple: trigger → LLM classifies → switch node routes → fallback catches anything below confidence. The payoff is high, the failure modes are predictable, and the code surface is small. The trap is reaching for an LLM on every branch when a literal rule would do. This article shows where the model earns its keep, where it doesn’t, the exact n8n shape that works in production, and the misclassification story that changed how I prompt.

Key takeaways

Use AI conditional logic for unstructured inputs (email body, support ticket text, image content) — not for clean enums where a literal rule wins.
Always emit structured JSON ({"category": ..., "confidence": ...}) and switch on it. Never branch on raw prose.
Confidence threshold ∈ [0.7, 0.85] for most workflows. Below it = fallback to human or generic handler.
Cap categories at 4–6 for the first version. More categories collapse accuracy fast.
Always log input + AI output + final routing decision. Without that audit trail you cannot improve the prompt.
McKinsey 2024: 62% of organisations are experimenting with multi-step AI agents — conditional routing is the first primitive of every one.

When AI conditional logic actually beats a plain IF

Pick the rule type based on input shape. Source: my own production pipelines over the last two years.
Input shape	Decision type	Use this
Webhook with numeric amount	amount > 500 → approve	Plain IF node, no LLM
Dropdown form field	category == "billing"	Plain Switch node, no LLM
Free-text support email	route by intent	LLM classifier + Switch
Mixed (text + amount)	urgency depends on tone + amount	LLM classifier + arithmetic IF
Image upload	route by content (invoice vs receipt)	Vision LLM + Switch
Structured webhook (Stripe, Shopify)	route by event type	Plain Switch, never LLM

The exact n8n shape that works

Six nodes, no agent magic, no LangGraph. This is the shape I default to and have shipped to clients dozens of times.

Webhook — receives the inbound message (email, form, webhook from another service).
Set — build the prompt as a clean string with the input embedded. Keep the prompt out of the HTTP node.
HTTP Request — call OpenAI / Anthropic / local Ollama with response_format: { "type": "json_object" }. Temperature 0.
IF — confidence ≥ threshold? If no, route to fallback (human / generic).
Switch — route by {{ $json.category }} into one branch per allowed category.
Postgres / Sheets append — log { input, ai_response, branch_taken, latency_ms, ts } on every run. This is non-negotiable.

The classifier prompt that holds up in production

bash

SYSTEM: You are a strict classifier. Read the user message and return a JSON object with exactly these keys:
  - category (one of: billing, technical, sales, feedback, spam)
  - confidence (number between 0 and 1)
  - summary (string, max 12 words)
  - urgency (one of: low, medium, high)
Return JSON only. No prose. No markdown. If unsure, set confidence below 0.7 and category to "spam".

USER: <<paste raw email body here>>

Three details that matter: (1) "JSON only, no prose" — without it GPT will sometimes wrap the JSON in markdown fencing. (2) the explicit "if unsure, set confidence below 0.7" instruction — LLMs default to overconfident if you don’t say this. (3) the closed enum of categories spelled out in the system message — anything outside it is automatically fallback territory.

The OpenAI HTTP call (drop this into n8n)

json

{
  "model": "gpt-4o-mini",
  "temperature": 0,
  "response_format": { "type": "json_object" },
  "messages": [
    { "role": "system", "content": "{{$node[\"BuildPrompt\"].json.system}}" },
    { "role": "user", "content": "{{$node[\"BuildPrompt\"].json.user}}" }
  ]
}

Use gpt-4o-mini for classification — it is roughly 1/30th the cost of the full model and matches accuracy on classification tasks with closed enums. At about $0.15 per million input tokens, 10,000 classifications a month cost roughly $1.50. Don’t overpay for the big model on a job it doesn’t need.

Real examples that ship money

Support email triage — inbound to support@ → classifier → high-urgency goes to senior agent Slack; billing goes to Stripe dashboard link reply; FAQ-able routes to canned response with a "reply yes if this didn’t help" follow-up. Typical drop: 40-60% of tickets resolved without human touch.
Lead routing — web form → classifier (company size + buying signal) → AE assignment in CRM. Hot leads SMS the AE within 30 seconds; cold leads enter a 3-touch nurture.
Content moderation — user submission → vision-capable LLM → safe / review / reject. The "review" branch is where the human sits; everything else is automated.
Invoice OCR + routing — PDF in → vision model extracts { vendor, amount, category } → IF on amount picks the approval chain (under $500 auto, $500-5000 manager, over $5000 VP). The LLM does the parsing; the IF does the routing.

The opinion I will defend

The misclassification story that changed how I prompt

May 2024, a Tuesday morning. An 8-person SaaS client, a support inbox doing ~140 messages a day, two humans triaging the whole thing badly. We shipped the classifier in an afternoon. Five categories, GPT-4o-mini, confidence threshold 0.7. First week: 82% routing accuracy, both humans freed up to 4 hours/day instead of 7. Looked great. Then complaints started coming in from the billing team — they were getting messages from very upset enterprise customers that were clearly NOT billing. Pulled the logs. The classifier was sending anything mentioning a dollar amount to "billing", including stuff like "your $99 plan is missing the SSO feature your sales rep promised" (that’s sales/feedback, not billing). The fix took 20 minutes: rewrote the system message to define categories by INTENT rather than by KEYWORDS — added "billing = the customer wants money moved, a refund, an invoice corrected, or a payment retried" instead of "billing = billing questions". Accuracy jumped to 94%. The lesson: an LLM classifier mirrors your category DEFINITIONS, not your category NAMES. Define them like a lawyer would.

Tools that work for this in 2026

Pricing from each vendor page as of June 2026.
Tool	Best for	Branching	Starts at
n8n	Self-hostable, full control	IF + Switch + AI Agent node	Free self-hosted
Zapier	Non-technical teams	Paths + ChatGPT step	$19.99/mo
Make	Visual designers	Router + OpenAI module	$10.59/mo
Pipedream	Devs who want code in steps	Plain JS/Python	Free 10k/mo
LangGraph	Custom agent flows	State graph with edges	Free OSS

Pitfalls to avoid

No fallback. Models occasionally return malformed JSON or low-confidence answers. Always have a "send to human" branch. The day you don’t, the day a model returns nothing useful.
15 categories from day one. Accuracy drops fast as the enum grows. Start with 4-6 and split categories only when logs show a real concentration of edge cases.
Latency surprise. A cloud LLM round-trip is 600 ms to 3 s. If your workflow needs sub-100 ms, use a small local model via Ollama — see running local LLMs in n8n.
Hardcoded prompts. Store the system prompt in a Set node or a Postgres row. You will iterate on it twenty times in the first month — don’t bury it inside an HTTP node.
No log of AI decisions. Append every (input, ai_output, branch_taken) tuple to a table. Six weeks in, that table is your prompt-improvement dataset.

“The strongest AI conditional logic looks boring from the outside: one model call, a JSON parse, a switch. Everything clever you can stop doing.”

Frequently asked questions

What is AI conditional logic in automation workflows?

Using an LLM (or classifier) to read an unstructured input and return a structured decision (category, confidence, urgency), then routing the workflow with a Switch node based on that decision. Replaces brittle keyword if/else.

When should I NOT use AI conditional logic?

When the input is already structured (amount, dropdown enum, webhook event type) or when a regex would work. Adding an LLM to a deterministic decision just adds latency, cost, and a new failure mode.

How accurate is AI classification in production?

For a 4–6 category problem with a well-defined prompt, expect 90–95% after two or three prompt iterations. Anything below that is a prompt problem, not a model problem — split ambiguous categories or sharpen the definitions.

What model should I use for conditional logic classification?

gpt-4o-mini, Claude Haiku, or a local Llama 3.1 8B with a strict JSON system prompt. The classification task does not benefit from the larger models; the 30× cost difference is wasted.

How do I handle low-confidence AI classifications?

Set a confidence threshold (typically 0.7-0.85) and route anything below it to a human review queue or generic handler. Then audit that queue weekly — if certain categories pile up there, your prompt needs sharpening.

How much does AI conditional logic cost to run?

With gpt-4o-mini at $0.15/M input tokens, a classifier handling 10,000 messages a month costs roughly $1.50. Self-hosted Ollama models cost only the electricity. For most teams the API bill is rounding error compared to the labour saved.

What platforms support conditional logic in workflows?

n8n (IF + Switch + AI Agent node), Zapier (Paths + ChatGPT step), Make (Router + OpenAI module), Pipedream (plain JS/Python in steps), and LangGraph (state-graph edges for code-first teams). See the comparison table above for branching style and pricing.

How do I implement conditional logic in an AI-driven workflow?

Call an LLM with a strict JSON system prompt that returns { category, confidence }, parse the JSON, then feed the structured field into a Switch or IF node. Never branch on raw model prose. Add a confidence threshold (0.7-0.85) and route anything below it to a fallback or human review.

How do I build fallback logic into AI-powered automation?

Three layers: (1) JSON-mode response_format so malformed output throws instead of silently misrouting, (2) a confidence threshold below which the branch defaults to human review, (3) a try/catch around the LLM call that routes API errors to a deterministic "best-guess" branch. The fallback branch should always exist.