Conditional Routing in AI Pipelines: Dynamic Tool Selection with LLMs
Conditional routing in an AI pipeline means picking the next step or tool based on the input rather than running every branch every time. Dynamic tool selection means the LLM itself picks which tool to call, usually through function calling or MCP. Combine the two and you get pipelines that handle wildly different inputs without growing into spaghetti.
The terms in plain English
Conditional routing: an if statement, possibly powered by an LLM, that decides which branch runs. Dynamic tool selection: giving the LLM a list of tools with descriptions and letting it call them. MCP (Model Context Protocol): Anthropic's 2024 open spec for connecting LLMs to tools, now supported by most major frameworks. Function calling: the older OpenAI-led pattern for the same thing.
The pipeline that grew teeth and ate itself
November 2024, I was helping a friend named Sam with a customer-support pipeline. Started simple: one classifier, three branches, each branch a different prompt and a different output template. By month three it had grown to 19 branches, four of them with their own sub-branches, and a Switch node so wide it did not fit on a screen. We rewrote it with dynamic tool selection. One LLM call, a list of 12 tools with one-sentence descriptions, the model picks one or two and the pipeline calls them. The new version was 180 lines shorter, ran 30% faster on average, and was the first time Sam said the workflow felt understandable rather than tolerated. The lesson: at some scale, hand-wired routing collapses under its own weight, and the LLM picking the tool is genuinely the cleaner abstraction.
The opinion that surprises people
Dynamic tool selection is overkill below about six branches and underused above about twelve. In between, taste decides. The mechanism: at low branch counts a switch is cheaper, faster, and more debuggable; at high branch counts the prompt that lists all branches outperforms the maintenance cost of the switch. The cost of being wrong is either an LLM call that does nothing a switch could not, or a switch node so wide it becomes a documentation problem. Hold this loosely; the right cutover depends on how varied your inputs are.
Three patterns, ranked by complexity
- Static switch — your existing if-else. Use it for routing on fields you can read directly. Free, fast, debuggable.
- LLM classifier into switch — one short LLM call returns a category from a fixed enum, then a switch routes on it. Use it for routing that needs to understand meaning across maybe six to twelve named branches.
- Dynamic tool selection — give the LLM a list of tools with descriptions, let it pick one or more, execute, return. Use it past twelve options or when the tool list changes often.
What a tool description should look like
Name in snake_case. One sentence of what it does. One sentence of when to use it. Examples cost more tokens than they earn at this level; descriptions earn them back. The model's choice quality is roughly linear in how clearly you wrote the descriptions. I have spent more hours rewriting tool descriptions than I have spent writing tools. The rewrites are usually where the quality gains come from.
The numbers from one cutover
On the pipeline I rewrote with Sam, the dynamic tool selection version called the wrong tool 4.2% of the time across 1,000 sampled requests in late 2024. The switch version it replaced had a 6.8% misrouting rate on the same sample (the model classifying into the switch enum was the bottleneck there too, but with less context). Net: fewer misroutes and shorter latency because the new version skipped the classification step entirely. Caveat: this is one pipeline, one model (Claude 3.5 Sonnet at the time), one corpus. Treat the numbers as suggestive.
Where dynamic selection breaks
When two tools have overlapping descriptions, the model thrashes between them and picks the wrong one based on phrasing. Fix by merging the tools or sharpening the descriptions, not by adding more examples. When the tool list grows past about 25, even good models start dropping options from consideration; split into two or three smaller routers (one per domain) and route to the routers first.
The cap that prevents disasters
Cap tools-per-request at 3, and cap total LLM calls per request at 6. The model occasionally decides it needs to call eight tools to answer a question that needed one. The cap turns that into a graceful failure instead of a hostile invoice.
Where the routing literature has gone, and one framework worth a look
The academic side of dynamic LLM routing has a useful anchor in the OptiRoute paper (arxiv 2502.16696, Piskala et al. 2024), which formalises routing as multi-objective optimisation across accuracy, latency, cost, and user-defined preferences. On the practical side, LLMRouter (github.com/ulab-uiuc/LLMRouter) is an open-source library that ships sixteen-plus routing models you can drop into a pipeline today, organised by routing strategy. Brenndoerfer's write-up on tool selection strategies is the cleanest single source on the three flavours I described above. If you are picking a routing approach for new work in 2026, the trend in the literature is toward small learned routers rather than larger LLM-judge routers, because the small ones are faster and cheaper per decision and good enough on narrow domains. Treat the LLM-judge router as a baseline you graduate from once you have enough traffic to train a small classifier on.
Frequently asked questions
Function calling or MCP?
MCP is the direction the ecosystem is moving (Anthropic, OpenAI, and major frameworks all support it as of 2025). Function calling still works fine and is simpler if you control both ends. New pipelines: start with MCP. Existing pipelines on function calling: no rush to migrate.
Can a local model do dynamic tool selection?
Yes. Llama 3.1, Qwen 2.5, and DeepSeek V2 all support tool calling reliably. The smaller distills (under 7B) struggle past about 10 tools; the 8B and 14B models handle 20 to 30 tools well.
How do I test routing?
Build a small eval set of inputs paired with the correct tool name. Replay against the pipeline. Measure the percentage that route correctly. Tune descriptions. Repeat. Without this loop you are guessing whether changes helped.
What about caching the routing decision?
Hash the input, cache the tool selection result for a few minutes, save the LLM call on repeat traffic. For high-volume pipelines this single optimisation cuts routing cost by 60 to 80% in my experience. For low volume, skip it; the cache complexity is not worth the savings.
Pick the smallest pattern that handles the next branch you have to add. Promote up the ladder only when the pattern below it groans.