Build Multi-Agent Workflows in n8n with DeepSeek and Ollama

6 min read · Updated Jun 4, 2026

A multi-agent workflow in n8n with DeepSeek on Ollama is three connected AI Agent nodes: a planner that breaks the task into steps, a worker that executes each step with tools, and a critic that grades the worker's output and decides whether to retry. DeepSeek R1's reasoning trace makes it a strong fit for the planner and critic roles; DeepSeek V2 or Qwen is faster for the worker.

Key takeaways

Three-agent (planner / worker / critic) workflows beat single-agent on long tasks because each role has a focused prompt instead of one mega-prompt.
Use DeepSeek R1 (8B or 14B) for the planner and critic — its reasoning trace makes failures interpretable; use a smaller, faster model for the worker.
Cap loops at MAX 3 critic-retries. Without a cap, agents burn GPU cycles forever on tasks they cannot solve.
Store every (plan, tool_calls, critique) tuple in Postgres for replay debugging — you will need it the first time an agent goes off the rails.
Multi-agent only beats single-agent when the task has clear sub-steps. For simple Q&A, the extra latency isn’t worth it.

Multi-agent, in one sentence

Multi-agent means more than one LLM call, each with a different prompt and a different role, coordinating to finish a task. It is not magic. It is plumbing.

The first time three agents beat one

March 2025, I was building a research summariser. Single agent: read five web pages, write a 400-word brief. The single-agent version produced a brief that was about 80% useful but consistently missed numbers and consistently invented one quote that did not exist. I split it into three agents: a planner that decided which page to read in what order, a worker with a web-fetch tool that pulled and quoted, and a critic that compared the brief against the quotes and flagged any number or quote not directly supported. The output went from 80% useful to roughly 95% useful, the invented quotes stopped, and the brief took 22% longer to produce. The trade looked obvious. The trade was not obvious until I built both.

The opinion

Most multi-agent setups are a worse loop with extra steps. The 5% of cases where multi-agent wins clearly are: tasks that need explicit verification (fact-checking, code review), tasks where role separation reduces context overload (researcher and writer), and tasks where the planner's plan is itself the deliverable. Outside those, a single well-prompted agent with a grader call usually matches the multi-agent version at a third of the cost. The mechanism: every extra LLM call adds latency, tokens, and a coordination point. Hold this loosely on tasks you cannot do well with one prompt; the right multi-agent split there is genuinely a step up.

n8n canvas showing three AI agent nodes labelled planner worker and critic connected in a loop

Why DeepSeek and why Ollama

DeepSeek R1, released January 2025, is the first open-weight model with a visible reasoning trace that holds up against frontier models on multi-step problems. The distilled variants (1.5B, 7B, 8B, 14B, 32B, 70B) run on Ollama and fit on hardware most developers already have. The 7B and 8B distills are the sweet spot for the planner and critic in a small multi-agent setup; the 14B if your machine can hold it. Per the DeepSeek R1 paper from January 2025, the 7B distill outperforms several much larger non-reasoning models on math and code benchmarks.

The three-agent shape in n8n

Planner — AI Agent node with the DeepSeek R1 7B model. System prompt: "You plan steps. Output a JSON array of steps. Each step has an action and an expected output. No prose outside the JSON."
Worker — AI Agent node with a faster model (DeepSeek V2 Lite or Qwen 2.5 7B). Tools attached: HTTP Request, web search, your own custom nodes. System prompt focused on executing one step at a time.
Critic — AI Agent node, back to DeepSeek R1 7B. System prompt: "You compare the worker's output to the step's expected output and return JSON: {pass: boolean, feedback: string}. Be specific in feedback."
Orchestration — a Loop Over Items node iterates the planner's steps. An IF node after the critic decides whether to send feedback back to the worker for one retry or move on.

Hard limits that keep cost sane

Cap planner steps at 8. Cap worker retries per step at 2. Cap total LLM calls per task at 30. These three caps prevent the runaway loops that turn an interesting demo into a four-figure inference bill. I learned this the hard way; you do not have to.

Real numbers from one run

On a single RTX 4070 running Ollama, a five-step research task takes about 47 seconds end to end with the three-agent setup. Same task on a single agent: 11 seconds. The four-times latency is real. The quality lift on factual accuracy is also real. Decide which you need before you build.

The mistake worth naming

Letting the planner and the worker share memory by passing the entire conversation forward. The whole point of role separation is information hiding. The worker should see the step it is working on, the tools, and the prior step's output. Not the planner's reasoning trace. Not the critic's previous feedback on unrelated steps. Keep the context narrow per agent and the quality goes up. Wide context confuses small models.

Sub-workflows as tools, and the orchestrator-research-writer framing

The most common multi-agent shape in the n8n community right now (per the n8n community thread at community.n8n.io/t/120176 and the dev.to write-up by emperorakashi20) is one main AI Agent that exposes other AI Agent sub-workflows as tools. The main agent is the orchestrator; the sub-workflows are role-specialised agents like ResearchAgent, WriterAgent, and ReviewerAgent, each with its own system prompt and its own tool access. The author of that dev.to piece is right to push back on calling a single agent with multiple tools a multi-agent system; it is not. The planner-worker-critic naming I used above maps onto orchestrator-research-writer one-to-one. Use whichever vocabulary your team finds clearer. On the model side, the felipefontoura/deepseek-local repo on GitHub gives you a working n8n plus Ollama plus DeepSeek-R1 reference if you want a copy-paste starting point.

Frequently asked questions

Do I need DeepSeek or will Llama 3.1 work?

Llama 3.1 works for the worker role. For the planner and critic on multi-step tasks, DeepSeek R1's reasoning training gives you measurably better step decomposition and grading. On a small eval I ran in September 2025, R1 7B beat Llama 3.1 8B on planner accuracy by about 12 points. Your mileage will vary.

Can I run this on a CPU?

Technically yes. Practically no. Three models cycling on a CPU pushes total runtime past three minutes per task, which kills the loop. Get to a GPU with at least 12 GB of VRAM or you will give up on multi-agent before you see its upside.

What about MCP for tool use?

MCP works with DeepSeek through Ollama if you front it with an MCP-compatible adapter. The agent ecosystem is moving toward MCP as the default tool protocol, and n8n's MCP support landed in their 1.65 release in mid-2025.

When does multi-agent actually beat single-agent?

When the task has a verifiable spec (the critic has something concrete to grade against), when context length is the bottleneck (split it across roles), and when planning is the bulk of the value (the user wants to see the plan).

Build the single-agent version first. Measure where it fails. Only split into multiple agents where the failure has a name.