When to reach for n8n vs writing the orchestrator yourself
n8n is the best low-code orchestrator I've used — and I don't use it for any of my agent workflows. That's not a contradiction; it's a scope decision, and the scope decision is the one most teams skip.
The question isn't "is n8n good?" It is. Self-hosted, open-source, visual workflow builder, 400+ integrations, a credentials vault that actually works, webhook triggers, cron triggers, error handling per node. For the thing it's built for — connecting settled APIs in a settled sequence — it's faster to ship in n8n than in code. I've built payment notification pipelines, form-to-CRM flows, and scheduled report generators in n8n that I'd never rewrite. They work. They've worked for months. That's the point.
Where it earns its keep
Three properties make n8n the right call.
The workflow is a settled sequence. Trigger fires, data transforms, API calls happen in order, result lands. The logic is known, the branches are enumerable, and the failure modes are understood. n8n's visual canvas makes this legible to anyone on the team — not just the person who wrote the Python. For settled logic, legibility beats flexibility.
The integrations already exist. n8n ships nodes for Slack, Gmail, Notion, Airtable, Stripe, hundreds more. If the workflow is "when X happens in system A, do Y in system B," and both systems have n8n nodes, you're wiring, not coding. The time-to-ship difference is a day vs a week.
The operator isn't the developer. If someone who isn't you needs to modify the workflow — change a threshold, swap a channel, add a CC line — the visual canvas is the UI. No PR, no deploy, no code review for a config change. For ops-adjacent workflows, this matters more than engineers think.
Where it doesn't
Three properties push me to code.
The logic is novel and changing daily. My agent system has eight agents, each with its own trigger surface, memory queries, tool calls, and output format. The orchestration logic changes every time I add a new Behavior node or update a prompt. In n8n, that means editing a visual flow, re-testing the happy path, re-testing the error paths, re-deploying. In code, it means editing a Python file, running the tests, committing. Code wins when the logic is still forming.
The pipeline is LLM-native. DSPy modules, LlamaIndex retrieval, Neo4j graph queries, structural eval rubrics — none of these have n8n nodes. I'd be shelling out to Python from n8n's Code node on every step, which means I'm maintaining n8n as a visual wrapper around a Python pipeline. That's overhead without payoff. If every node is a Code node, you don't need n8n — you need a script.
Debugging requires the full trace. When an agent draft scores 0.6 on the structural rubric, I need to see the retrieved context, the compiled prompt, the raw LLM output, and the eval breakdown. n8n's execution log shows input/output per node, but the trace I need lives in the LLM call — the system prompt, the few-shot examples, the completion tokens. Langfuse gives me that trace. n8n's execution view doesn't go deep enough for LLM debugging.
The split
The decision matrix is simpler than it looks.
| Settled logic | Novel / changing logic | |
|---|---|---|
| API-to-API | n8n | n8n (still, with Code nodes) |
| LLM-native pipeline | Code (but consider n8n for the trigger) | Code |
The hybrid is real: n8n as the trigger layer (cron, webhook, form submission) that kicks off a Python pipeline. I've run this pattern — n8n fires a webhook to a local Flask endpoint that runs the DSPy module. n8n handles the schedule and the retry; Python handles the logic. Clean seam.
The cost of the wrong call
Reaching for n8n when you need code: you spend the first week wiring, the second week fighting Code node limitations, the third week rewriting in Python. I've done this.
Reaching for code when you need n8n: you spend a week building a scheduler, a webhook listener, a credential store, and an error-retry loop that n8n ships out of the box. I've done this too.
Both mistakes cost about the same — a week. The difference is that the n8n mistake is harder to recover from, because by week three you have a visual workflow that someone else has started depending on, and migrating it to code means rebuilding from scratch. The code mistake is just deleting a file and installing n8n.
The rule I use
If the workflow is settled and API-to-API, start in n8n. If the workflow is novel, LLM-native, or changing faster than once a week, start in code. If you're not sure, start in code — it's cheaper to move from code to n8n than the reverse.
