AI Agents for Business Automation Workflows
7 min read · Updated Jun 4, 2026
AI agents are the right primitive for a narrow set of business problems and the wrong primitive for almost everything else. The narrow set: tasks where the steps genuinely vary based on what the previous step found out. The wrong set: anything that already fits a fixed N-step sequence. The first lands a 10x productivity bump; the second lands a quietly enormous OpenAI bill for the same outcome a Zap would have produced. This article shows where agents ship to production, the 4 patterns that work, the 47-call disaster that taught me to default to workflows, and the choice you actually have to make on day one.
Key takeaways
- Use a workflow when the steps are fixed. Use an agent only when the step list is decided by the previous step’s output.
- A "research → enrich → email" sequence is a workflow, not an agent. Don’t dress one as the other.
- Agents cost 5–20x more LLM calls than the equivalent workflow because they think about whether to act, not just act.
- Add a hard step cap (max iterations: 8 in most cases). Without it, an agent stuck in a loop is a credit-card incident.
- Human-in-the-loop checkpoints belong on every irreversible action (send email, charge card, file ticket).
- McKinsey 2024: 62% of organisations experimenting with multi-step AI agents. Most successful pilots are scoped to a single task, not a "do my whole job" agent.
Agent vs. workflow — the picking rule
| Trait | Workflow (Zapier / n8n) | Agent (LangGraph / CrewAI / OpenAI Assistants) |
|---|---|---|
| Step list | Known in advance | Decided by the model on the fly |
| Branching | <= 5 deterministic branches | Many, including loops and self-correction |
| LLM calls per execution | 0–2 | 5–50+ |
| Cost per run | $0.001 - $0.02 | $0.05 - $1+ |
| Failure mode | Predictable, easy to debug | Infinite loop, hallucinated tool call, runaway cost |
| Right fit | 90% of business automations | ~10% — truly variable research / negotiation |
The four agent patterns that actually ship to production
- Single-task tool-using agent. One agent, 3–6 tools, one goal. e.g. "given this support ticket, look up the customer, look up their last 5 tickets, draft a reply, ask the human before sending." Boring. Reliable. The 80/20 of agents that ship.
- Plan-then-execute (PaE). One LLM call builds a full plan up front. A deterministic runner executes each step. The plan is fixed once made — no re-planning per step. Massively cheaper than ReAct loops, easier to debug, suitable for ~70% of "research" use cases.
- Critic / reviewer pair. One agent does the work (drafts a report, writes code, picks a leasing option). A second agent (the critic) reviews against a checklist and either approves or returns specific feedback. Two LLM calls per artifact instead of one, with a measurable quality lift.
- Supervisor + specialist swarm. One supervisor agent delegates to specialist sub-agents (researcher, writer, fact-checker). Use sparingly — it is the most expensive pattern. Worth it when the specialists genuinely have different system prompts and tool sets, not when they’re all "GPT-4o with a different name."
The 2026 agent tools, ranked by where I would reach
| Tool | Best for | Skill ceiling | Cost model |
|---|---|---|---|
| n8n AI Agent node | Single-task agent inside an existing workflow | Low (visual + JSON) | Free self-hosted |
| Zapier Agents | Non-technical teams, gluing existing Zapier apps | Very low | From $50/mo |
| LangGraph | Code-first, stateful multi-step agents | High | Free OSS |
| CrewAI | Multi-agent swarms in Python | Medium | Free OSS + paid platform |
| OpenAI Assistants API | Quick prototypes tied to OpenAI | Medium | Per-token + per-run |
| Anthropic Claude with tool use | When you want the strongest tool-using model | Medium | Per-token |
The opinion I will defend
The 47-LLM-call story that taught me to default to workflows
October 2024, a Thursday morning. A 15-person B2B sales-tools company asked me to build a "research agent" that would qualify inbound leads. Spec: take an email + company domain, research the company online, score the lead, draft a personalised reply, log to Salesforce. I built it with LangGraph and CrewAI. Two agents, six tools, ReAct loop. Worked on the demo. Beautiful. We shipped to production processing about 180 leads/day. Three days in I checked the OpenAI bill: $32 for 540 leads, or ~$0.18 per lead. That’s small in absolute terms but I knew the cost should have been ~$0.02 in tokens for the actual outputs. Pulled the trace logs. The agent was making 47 LLM calls per lead on average — most of them were the agent thinking about whether to call a tool, calling the tool, reflecting on whether the tool output was good, deciding to call another tool, and so on. The actual work was 4 calls; the other 43 were the agent talking to itself. I tore the whole thing out and replaced it with a flat 4-step n8n workflow: scrape domain → enrich via Clearbit → single LLM call to score + draft email → Salesforce write. Same output quality on a 100-lead sample. Cost: $0.021/lead. 9x cheaper. The "agent" had been a workflow all along; I had just dressed it in a more expensive costume because the use case had the word "research" in it. Now I draw the flowchart first. If I can draw it, it’s a workflow.
Where agents do earn their keep
- Customer-facing research. "Find me everything we know about Acme Corp before this 9 a.m. call" — web search, CRM lookup, past-ticket dig, news scan, summarise. The step list legitimately varies per company.
- Multi-step debugging assistants. "Here’s a stack trace and the relevant repo — figure out which file the bug lives in." Iterates: grep, read file, hypothesise, grep again.
- Long-running back-office tasks with branching. Procurement quote comparison, vendor due diligence, multi-document contract analysis. Worth the per-task cost because the alternative is a human spending 40 minutes.
- Open-ended QA with tool access. Internal "ask the data warehouse" assistant where the agent decides which table, which join, which filter. Plain workflow cannot do this.
Guardrails every production agent needs
- Max-iteration cap. 8 by default. Higher only if you can justify it from logs.
- Tool allowlist. No "let the agent decide to shell out" or "give it arbitrary HTTP". Each tool is a named function with a typed input/output.
- Human-in-the-loop before irreversible actions. Send-email, charge-card, file-ticket, run-DDL. The agent drafts; the human one-clicks.
- Cost ceiling per run. Track tokens; trip a kill-switch at $0.50 per single agent run (or whatever the right number is for your task).
- Audit log of every tool call. Inputs, outputs, timestamps. Without it, debugging an agent failure is impossible.
“Most agents in production today should have been workflows. The discipline of asking "can I draw this as a flowchart?" cuts the agent-or-workflow decision in two minutes.”
Frequently asked questions
Frequently asked questions
What is the difference between an AI agent and an AI workflow?
A workflow runs a fixed list of steps. An agent decides the next step on the fly based on what the previous step returned. If you can draw your task as a flowchart up front, it is a workflow — dressing it as an agent just makes it 5-20x more expensive.
When should I use AI agents for business automation?
Use agents for tasks where the step list genuinely varies per execution — inbound-lead research, contract analysis, multi-doc QA, debugging assistants. For everything else (lead routing, support triage, invoice processing, status updates) a workflow is cheaper, faster, and easier to debug.
How much do AI agents cost to run?
Typical production agents run $0.05–$1+ per execution, dominated by the model thinking about whether to call tools. Always set a per-run cost ceiling and a max-iteration cap. Without those, a single stuck loop turns into a $200 incident.
What is the best AI agent framework in 2026?
For most teams: n8n AI Agent node for visual single-task agents, LangGraph for code-first stateful multi-step agents, Zapier Agents for non-technical teams. Skip CrewAI/AutoGen unless you have a genuine multi-agent need — most "multi-agent" pitches are one-agent problems in disguise.
How do I prevent an AI agent from running away?
Three controls: hard max-iteration cap (8 by default), tool allowlist with typed schemas, and a human-in-the-loop checkpoint before any irreversible action. Add an absolute cost ceiling that trips a kill-switch.
Do I need a multi-agent system or is one agent enough?
One agent is enough for ~95% of real use cases. Multi-agent is justified when the sub-agents genuinely have different system prompts and tool sets — e.g. a researcher with web search vs a writer with no internet vs a critic with the original brief. If they’re all the same model with different names, collapse them.