Do you need a multi-agent system?

Thursday, May 28, 2026

Honest question if you're building with Claude Code or any similar agentic CLI: do you actually need a multi-agent system?

Because Claude Code is already an agent. One agent. You give it skills, hooks, MCP servers, a CLAUDE.md, and it can do basically anything you'd want — write code, run shell commands, query a database, draft a cover letter, post to X. Why split that into a roster of named specialists?

I've been wrestling with this. The system I'm running on top of my own life right now is one agent per domain: Plutus (capital), Atlas (habits and reviews), Sage (the reflective layer), Iris (presence), Hunt (work and content), Ethos (relationships and warm intros), Janus (the books), Smith (engineering), plus Lex (an investigative lane) and Mentor (teaching) currently paused. Each one has its own markdown spec, its own trigger phrases, its own scope. The dispatch logic is "if Neil says X, route to agent Y."

The doubt: a single agent with full context — all the memory, all the tools, all the behaviors loaded at once — can make cross-domain connections that ten specialists will miss. A multi-agent system, by design, fragments that. And the dispatch itself is overhead. Every prompt becomes a routing problem first.

So when does the split earn its keep?

1. When the posture varies

When I ask Janus "what's the Nut?" I want a different posture than when I ask Hunt "what should I apply to?" Janus is calm, accounting-brained, doesn't speculate. Hunt is sharp, hungry, gives me a number and a next move. If both came from the same voice, they'd average out to something neither of them is.

This is the soft reason but it's the real one. The agents aren't separate intelligences — they're modes. "Plutus, scout" is a shorter way to say "think about this as a yield hunter, not a generalist." The agent name is the modality switch.

You can do this with one agent and skill-based prompts ("answer this as my CFO"). It works. It just doesn't stick the way a named persona does, and you end up retyping the framing on every turn.

2. When the guardrails differ

This is the hard reason — and it's the one I underweighted at first.

Different agents have genuinely different permissions baked into their specs:

Hunt drafts but never sends. Cover letters, LOIs, tweets, capital pitches — drafted, queued, never auto-submitted. Hardcoded.
Hunt also has a non-negotiable gate on LOIs: it won't draft a Letter of Intent for an acquisition unless the capital-readiness flag is GREEN. If you ask it to skip the gate, it surfaces the rule and forces you to confirm.
Sage runs only on local Ollama. No paid API calls. Ever. That's not a preference — it's encoded in the agent.
Janus is forbidden from calling Anthropic or recommending scheduled remote agents. Books-and-cashflow work shouldn't be metering up a hosted bill.
Ethos never bulk-follows or auto-DMs. Hand-drafted only, then I press send.

Encoding all of those as conditionals inside one mega-prompt is possible. It's also fragile — every new prompt is a chance for the model to drop one of them. Per-agent specs let each guardrail live in the place it'll get loaded every time the agent is summoned, no exceptions.

3. When the handoffs are formal

This is where it actually became unavoidable for me.

The capital flow in my system is a three-agent relay:

Plutus sources and scores capital targets (an acquisition, an investor match, a grant round). It does the math — DSCR, capital stack, valuation. It does not draft anything.
Ethos checks the relationship graph for warm paths to that target. Does Neil know someone who knows them? It supplies the intro context. It does not pitch.
Hunt drafts the actual document — the LOI, the investor one-pager, the cold seller letter — using Plutus's numbers as inputs and Ethos's intro context as the opener. It does not source, and it does not send.

Each stage requires different skills, different voice, and (per #2) different guardrails. Hunt's LOI gate, for example, checks for a flag set by Plutus. If you collapsed all three into one agent, that gate becomes a self-check — and self-checks are the first thing a sufficiently-prompted model talks itself out of.

The relay also forces me to be explicit about which step is broken when something goes wrong. A single agent that returns a weak LOI obscures whether the math was off, the warm path was thin, or the drafting was bad. The relay surfaces it.

4. When the telemetry matters

I track invocations per agent. This month: Smith ran 164 times, Hunt 54, Plutus 9, Sage 8, Mentor 7, Iris 10, Ethos 4, Janus 3.

That distribution tells me something a single-agent system would erase: where my life is actually going. Engineering is dominating. Capital and relationship work is undercooked relative to my stated priorities. Janus running three times in a month means I'm under-reconciling.

You can recover some of this from log scraping with one agent. But the named-agent shape gives you the signal for free, and it's stable across whatever you change inside each agent's spec.

Where the overlaps land — feature, not bug

The domains overlap. A lot. Hunt and Ethos both touch people. Plutus and Janus both touch money. Smith and Hunt both touch content (Smith finishes a coding session with content seeds; Hunt picks them up and ships them). Iris and Sage both read the same situations through different lenses — Iris from external presence, Sage from the reflective side.

That overlap used to feel like a sign I'd carved the system wrong. It isn't. The overlap is where the handoffs live. Each overlap is a documented protocol — "Smith ends sessions with content seeds, Hunt reads them, picks 0–3, drafts and ships" — and the named edge between two agents is the thing the system reasons about.

If you collapse the overlapping agents into one, you don't eliminate the overlap. You just lose the explicit handoff and bury it in conditionals.

So: do you need this?

For most of what people do with Claude Code day-to-day — generic coding, generic research, generic chat — no. One agent is fine. The multi-agent framing is overhead with no return.

Reach for a roster when at least two of these apply:

The posture you want changes meaningfully across domains.
The guardrails differ — some lanes can auto-act, others can't; some can call paid models, others can't; some have hard gates that must be enforced from outside, not by the model itself.
The handoffs are formal — discrete relays with different abilities at each stage.
The per-mode telemetry matters to you (whether for cost, focus, or self-awareness).

If only one of those applies, a single Claude Code agent with well-organized skills and hooks is almost certainly the right call. Skills cover posture. Hooks cover ambient automation. CLAUDE.md carries the constitution.

Where the wheels come off is around hard guardrails and formal handoffs. That's the seam where one agent stops being enough — and where the work of naming, scoping, and bounding a real roster starts paying for itself.

I'm still figuring out which of my ten are real specialists and which are vanity names. But I'm increasingly sure the answer to "do you need this?" is "only if your guardrails or handoffs say you do."

Do you need a multi-agent system?

1. When the posture varies

2. When the guardrails differ

3. When the handoffs are formal

4. When the telemetry matters

Where the overlaps land — feature, not bug

So: do you need this?

Continue reading

Loop engineering without the cloud bill

Running agents on the go

When to reach for n8n vs writing the orchestrator yourself