Conditional Routing in AI Pipelines: Dynamic Tool Selection with LLMs

6 min read · Updated Jun 4, 2026

Conditional routing in an AI pipeline means picking the next step or tool based on the input rather than running every branch every time. Dynamic tool selection means the LLM itself picks which tool to call, usually through function calling or MCP. Combine the two and you get pipelines that handle wildly different inputs without growing into spaghetti.

Key takeaways

Static routing with rules ALWAYS beats dynamic LLM routing when the categories are stable — cheaper, faster, and 100% predictable.
Use function calling / MCP for dynamic tool selection only when the tool surface is genuinely open-ended (e.g. an agent over a docs corpus).
Keep tool descriptions short, action-oriented, and concrete — LLMs pick tools by description; long ramble descriptions get mis-selected.
Always validate the LLM’s tool arguments against a schema (Zod, Pydantic) before executing — hallucinated args silently corrupt downstream state.
Log every (input, chosen_tool, args, outcome) row — wrong tool selection is the #1 source of agent failures in production.

The terms in plain English

Conditional routing: an if statement, possibly powered by an LLM, that decides which branch runs. Dynamic tool selection: giving the LLM a list of tools with descriptions and letting it call them. MCP (Model Context Protocol): Anthropic's 2024 open spec for connecting LLMs to tools, now supported by most major frameworks. Function calling: the older OpenAI-led pattern for the same thing.

The pipeline that grew teeth and ate itself

November 2024, I was helping a friend named Sam with a customer-support pipeline. Started simple: one classifier, three branches, each branch a different prompt and a different output template. By month three it had grown to 19 branches, four of them with their own sub-branches, and a Switch node so wide it did not fit on a screen. We rewrote it with dynamic tool selection. One LLM call, a list of 12 tools with one-sentence descriptions, the model picks one or two and the pipeline calls them. The new version was 180 lines shorter, ran 30% faster on average, and was the first time Sam said the workflow felt understandable rather than tolerated. The lesson: at some scale, hand-wired routing collapses under its own weight, and the LLM picking the tool is genuinely the cleaner abstraction.

The opinion that surprises people

Dynamic tool selection is overkill below about six branches and underused above about twelve. In between, taste decides. The mechanism: at low branch counts a switch is cheaper, faster, and more debuggable; at high branch counts the prompt that lists all branches outperforms the maintenance cost of the switch. The cost of being wrong is either an LLM call that does nothing a switch could not, or a switch node so wide it becomes a documentation problem. Hold this loosely; the right cutover depends on how varied your inputs are.

Pipeline diagram showing an input feeding a central decision node that branches to multiple specialised tool nodes

Three patterns, ranked by complexity

Static switch — your existing if-else. Use it for routing on fields you can read directly. Free, fast, debuggable.
LLM classifier into switch — one short LLM call returns a category from a fixed enum, then a switch routes on it. Use it for routing that needs to understand meaning across maybe six to twelve named branches.
Dynamic tool selection — give the LLM a list of tools with descriptions, let it pick one or more, execute, return. Use it past twelve options or when the tool list changes often.

What a tool description should look like

Name in snake_case. One sentence of what it does. One sentence of when to use it. Examples cost more tokens than they earn at this level; descriptions earn them back. The model's choice quality is roughly linear in how clearly you wrote the descriptions. I have spent more hours rewriting tool descriptions than I have spent writing tools. The rewrites are usually where the quality gains come from.

The numbers from one cutover

On the pipeline I rewrote with Sam, the dynamic tool selection version called the wrong tool 4.2% of the time across 1,000 sampled requests in late 2024. The switch version it replaced had a 6.8% misrouting rate on the same sample (the model classifying into the switch enum was the bottleneck there too, but with less context). Net: fewer misroutes and shorter latency because the new version skipped the classification step entirely. Caveat: this is one pipeline, one model (Claude 3.5 Sonnet at the time), one corpus. Treat the numbers as suggestive.

Where dynamic selection breaks

When two tools have overlapping descriptions, the model thrashes between them and picks the wrong one based on phrasing. Fix by merging the tools or sharpening the descriptions, not by adding more examples. When the tool list grows past about 25, even good models start dropping options from consideration; split into two or three smaller routers (one per domain) and route to the routers first.

The cap that prevents disasters

Cap tools-per-request at 3, and cap total LLM calls per request at 6. The model occasionally decides it needs to call eight tools to answer a question that needed one. The cap turns that into a graceful failure instead of a hostile invoice.

Where the routing literature has gone, and one framework worth a look

The academic side of dynamic LLM routing has a useful anchor in the OptiRoute paper (arxiv 2502.16696, Piskala et al. 2024), which formalises routing as multi-objective optimisation across accuracy, latency, cost, and user-defined preferences. On the practical side, LLMRouter (github.com/ulab-uiuc/LLMRouter) is an open-source library that ships sixteen-plus routing models you can drop into a pipeline today, organised by routing strategy. Brenndoerfer's write-up on tool selection strategies is the cleanest single source on the three flavours I described above. If you are picking a routing approach for new work in 2026, the trend in the literature is toward small learned routers rather than larger LLM-judge routers, because the small ones are faster and cheaper per decision and good enough on narrow domains. Treat the LLM-judge router as a baseline you graduate from once you have enough traffic to train a small classifier on.

Frequently asked questions

Function calling or MCP?

MCP is the direction the ecosystem is moving (Anthropic, OpenAI, and major frameworks all support it as of 2025). Function calling still works fine and is simpler if you control both ends. New pipelines: start with MCP. Existing pipelines on function calling: no rush to migrate.

Can a local model do dynamic tool selection?

Yes. Llama 3.1, Qwen 2.5, and DeepSeek V2 all support tool calling reliably. The smaller distills (under 7B) struggle past about 10 tools; the 8B and 14B models handle 20 to 30 tools well.

How do I test routing?

Build a small eval set of inputs paired with the correct tool name. Replay against the pipeline. Measure the percentage that route correctly. Tune descriptions. Repeat. Without this loop you are guessing whether changes helped.

What about caching the routing decision?

Hash the input, cache the tool selection result for a few minutes, save the LLM call on repeat traffic. For high-volume pipelines this single optimisation cuts routing cost by 60 to 80% in my experience. For low volume, skip it; the cache complexity is not worth the savings.

Pick the smallest pattern that handles the next branch you have to add. Promote up the ladder only when the pattern below it groans.