Event-Driven AI Automation Workflows

7 min read · Updated Jun 4, 2026

Event-driven AI automation is the difference between a workflow that runs every 5 minutes and a workflow that runs the instant something interesting happens. The shape is simple — a producer emits an event, a bus delivers it, a consumer reacts with AI in the loop. The payoff is response times that drop from minutes to seconds and a smaller compute bill because nothing runs unless something happened. The trap is per-event pricing that quietly multiplies on a retry loop. This article shows the four shapes that work, the two that bankrupt you, and the abandoned-cart story that swung me hard from polling to events.

Key takeaways

Event-driven ≠ webhook. A webhook is the cheapest producer; an event bus is what makes fan-out and replay possible.
Switch from polling to events the moment "every N minutes" feels too slow on the customer side. Below 5-minute polling → already costlier than events.
Layer AI on the consumer, not the producer. Classification, enrichment, summarisation happen after the event hits the queue.
Idempotency keys on every consumer. Buses redeliver. Without an idempotency check you triple-send the customer email.
Per-event pricing flips to flat-rate pricing as a winner around ~1M events/month. Run the maths before you pick.
Stripe processes billions of webhook events per year (Stripe 2024 engineering blog) on the same primitive — it scales as far as you need.

Event-driven vs. scheduled — when to switch

My personal switching rule, calibrated on the last two years of client work.
Trigger frequency you wish you had	Use	Why
Daily or less often	Cron / scheduled	Events are overkill; cost of bus > savings
Hourly	Cron, still	Polling cheap; observability simpler
Every few minutes	Time to switch to events	You are paying compute to find no change 95% of the time
Sub-minute	Events, no debate	Polling latency caps your customer experience
Real-time / instant	Events with WebSocket fan-out	Polling cannot do this no matter how often you run it

The four event-driven shapes that work in production

Webhook → queue → worker with LLM. The default. Producer (Stripe, Shopify, custom app) POSTs to your webhook receiver. Receiver immediately enqueues the raw event and returns 200 in < 50ms. A background worker pulls from the queue, calls the LLM, writes the result.
Bus + multiple consumers (fan-out). One order.placed event → three consumers: AI personalised receipt email, fraud-score check, inventory deduction. Each consumer scales independently. EventBridge / Kafka / NATS handle this natively.
Domain event with AI enrichment. Your own app emits a domain event (support.ticket.created) onto an internal bus. AI consumer enriches it with sentiment + topic + urgency and emits a derived event (support.ticket.classified). Downstream consumers subscribe to the classified event, not the raw one.
Streaming + windowed aggregation. Kafka or Kinesis stream of clickstream / sensor / metric events. AI batch-summarises every minute or every N events, emits an alert if anomaly score crosses threshold. The AI is summarising the windowed batch, not every event.

The webhook → queue → worker shape, in code

typescript

// 1. Webhook receiver — returns 200 in < 50ms
import express from 'express';
import { Queue } from 'bullmq';

const app   = express();
const queue = new Queue('cart-events', { connection: { host: 'localhost', port: 6379 } });

app.post('/webhooks/shopify', express.raw({ type: 'application/json' }), async (req, res) => {
  const eventId = req.header('X-Shopify-Webhook-Id'); // dedupe key
  await queue.add(
    'cart.abandoned',
    { raw: req.body.toString(), receivedAt: Date.now() },
    { jobId: eventId, removeOnComplete: true, attempts: 5, backoff: { type: 'exponential', delay: 2000 } },
  );
  res.status(200).end();
});

// 2. Worker — calls the LLM
import { Worker } from 'bullmq';
import OpenAI from 'openai';

const openai = new OpenAI();
new Worker('cart-events', async (job) => {
  const event = JSON.parse(job.data.raw);
  const personalised = await openai.chat.completions.create({
    model: 'gpt-4o-mini',
    temperature: 0.3,
    response_format: { type: 'json_object' },
    messages: [
      { role: 'system', content: 'Write a short JSON {subject, body} cart-recovery email. Friendly, no emoji, mention 1 abandoned item by name.' },
      { role: 'user',   content: JSON.stringify(event.line_items) },
    ],
  });
  await sendEmail(event.customer.email, JSON.parse(personalised.choices[0].message.content!));
});

The jobId: eventId is the whole game. Shopify retries the webhook if you don’t 200 fast enough. BullMQ dedupes on jobId, so the second delivery is a no-op. Without it, your customer gets two recovery emails for the same cart.

Tools, by tier

Pricing from each vendor page, June 2026.
Tier	Tool	Best for	Pricing model
No-code	n8n / Make / Zapier	Visual webhook → AI → action	Flat-rate per month
Code, serverless	Inngest / Trigger.dev	Durable event functions in TS / Python	Per-run + flat tier
Self-hosted broker	Redis Streams / NATS / Kafka	Internal domain events	Compute only
Cloud event bus	AWS EventBridge / GCP Pub/Sub	Cross-service fan-out at scale	Per-event ($1 / million)
Streaming	Confluent Kafka / Kinesis	High-throughput clickstream / IoT	Per-GB throughput

The opinion I will defend

The abandoned-cart story that swung me from polling to events

June 2024, a Wednesday afternoon. A 6-person Shopify-based clothing brand, abandoned-cart recovery was running on a cron every 30 minutes — query Shopify for new abandoned checkouts since the last poll, dedupe in Postgres, send a recovery email via Mailchimp. Recovery rate sat at about 4.2%. We switched the trigger to the Shopify checkouts/abandoned webhook, queued via BullMQ, processed by a small worker that built a personalised AI email mentioning the specific item the customer left behind. Same email template, same sender. Recovery rate over the next month: 7.8%. The owner thought we had changed the email copy — we hadn’t. The customer received the email 3-8 minutes after walking away from the cart, instead of an average 17 minutes (and worst-case 30) with cron. The 25-minute swing is the difference between "I’m still in the buying mood" and "I’ve already moved on and am washing dishes." Speed beats copy. The whole switch took half a day. The catch I almost missed: the Shopify webhook had a delivery retry that fired twice on roughly 0.3% of events; without the jobId: eventId dedupe on the queue, that would have been 6 duplicate emails per 1,000 carts. Caught in staging by accident. Added the dedupe; shipped.

Real examples that earn back the bus

Support ticket triage — helpdesk emits ticket.created → AI classifier consumer enriches with urgency + topic → emits ticket.classified → routing consumer assigns to team → dashboard consumer updates real-time view. One event, three subscribers, no polling.
Real-time fraud check — Stripe payment_intent.succeeded webhook → queue → AI worker scores transaction against last 30-day history of that customer → if score > 0.7, freeze the order and Slack the ops team in under 30 seconds.
Content moderation — user submission → event onto NATS → vision-capable LLM consumer (safe / review / reject) → derived event fans out to publish, queue-for-human, or archive consumers. Adding a fourth consumer (analytics) is a one-line subscribe.
Anomaly alerts on streams — Kafka topic of metrics, 1-minute tumbling window, AI summariser emits an event if anomaly score crosses threshold — same pattern as automated financial alerts with webhooks and AI.

“Polling is what you do when you have not yet earned an event. Once the volume justifies it, the change is small, the latency drop is large, and your future self will thank you.”

Frequently asked questions

What is an event-driven AI automation workflow?

A workflow that fires when a specific event happens (webhook, queue message, domain event) rather than on a schedule, with an AI step on the consumer side doing classification, enrichment, or generation. The AI sits after the event hits the queue, not inside the producer.

When should I switch from polling to event-driven?

The moment you find yourself polling more often than every 5 minutes. Below that polling cost (compute + API calls finding nothing changed) overtakes event-bus cost, and the customer-facing latency starts hurting you on UX metrics.

Do I need a real event bus like Kafka or EventBridge to start?

No. A Redis-backed queue (BullMQ, SQS, Inngest) is enough for the first 6–12 months of almost every project. Reach for a bus only when you have at least two consumers that genuinely need the same event.

How do I prevent duplicate processing of the same event?

Use the upstream event ID (Shopify X-Shopify-Webhook-Id, Stripe id field) as the queue job ID or as a unique-constraint key in your dedupe table. Always check before processing. Buses redeliver; idempotency is yours to enforce.

What are the costs of an event-driven AI pipeline?

For most teams: $0–$50/month of infrastructure (Redis, n8n self-hosted, or Inngest free tier) plus the AI API bill ($5–$200/month). The expensive scenario is per-event pricing on EventBridge or Kafka with retries spiralling — cap retries, send to DLQ, alert on depth.

Can event-driven AI workflows scale?

Stripe processes billions of webhook events a year on this exact primitive (Stripe 2024 engineering blog). The pattern scales to anything you will hit in practice; the bottleneck is almost always your downstream AI rate limit, not the bus.