Why I picked Neo4j over a vector store for my agent system — and when I'd flip

Tuesday, May 26, 2026

The agent system that runs my daily life — eight named agents, a shared memory layer, ~25K nodes across three containers — stores everything in Neo4j, not a vector store. The questions I actually ask every day are graph questions, and a vector store can't answer them without rebuilding the graph at query time.

The distinction matters. A vector store answers "what's similar to this?" A graph answers "what's connected to this, and how?" Both are useful. They're not the same question. The system I needed answers the second one hundreds of times a day.

What the graph actually holds

Three Neo4j containers. The canonical one — gl-graph-godmode — holds the knowledge layer for the agent system: Behavior nodes (the rules the system follows), MemoryEntry nodes (episodic log, append-only), Topic nodes (canonical state per thread), Session nodes (conversation handoffs), Person nodes, Project nodes, Goal nodes, Protocol nodes, Idea nodes. Plus ~16K VaultChunk nodes from the vault scanner that indexes two Obsidian vaults and the dev tree.

The graph isn't a schema I designed upfront. It grew from the questions the agents kept needing to answer.

The questions that made the choice

Five questions I ask every session. Each one is a graph traversal. Each one would be a multi-step retrieval-and-reassemble operation in a vector store.

"What's the current state of X?" — A Topic node query. MATCH (t:Topic {name:'Active GL Focus'}) RETURN t. One hop. The vector equivalent: embed the query, retrieve the top-k chunks, hope one of them is the current state instead of a stale reference from three months ago. The Topic node is the source of truth. The log entries that updated it are linked by relationship. No ranking ambiguity.

"What corrections has Neil made about Y?" — Behavior nodes. MATCH (b:Behavior) WHERE b.name CONTAINS 'Y' RETURN b.rule, b.why, b.how. Each Behavior node captures the rule, the reason, and how to apply it. A vector store would return similar-sounding chunks from across the entire corpus — some from before the correction, some after. The graph has the correction as a node with a date, a source, and a relationship to what it corrected.

"Who haven't I talked to in 90 days?" — A time-bounded traversal. MATCH (p:Person) WHERE p.last_contact < date() - duration('P90D') RETURN p.name. Try that with cosine similarity.

"What did I work on in this session, and which projects did it touch?" — Session nodes linked to projects, agents, goals, ideas by typed relationships. MATCH (s:Session)-[r]->(n) WHERE s.date = date() RETURN type(r), labels(n), n.name. The graph returns the full context map in one query. A vector store returns chunks that mention the session — if the embeddings happen to cluster right.

"Which ideas connect to which skills and which projects?" — The relationship web. MATCH (i:Idea)-[:RELATES_TO]-(n) RETURN i.name, labels(n), n.name. This is the question vectors fundamentally can't answer, because the relationship type matters. An idea that RELATES_TO a Skill is different from an idea that RELATES_TO a Person, even if the embedding distance is similar. The graph encodes the relationship label. The vector store encodes proximity.

Where vectors still win

I use vectors. The site you're reading has a chat surface powered by retrieval over the markdown archive — LlamaIndex, nomic-embed-text, bge-reranker. The voice-learning drafter pipeline retrieves context from the post archive the same way. Vectors are the right tool for "find the passages most relevant to this query" when the corpus is unstructured prose.

The line: structured knowledge → graph. Unstructured retrieval → vectors. Both exist in the same system. They answer different questions on different data.

When I'd flip

Three cases where I'd move something out of the graph and into a vector store.

The nodes stop having typed relationships. If the data is a flat collection of documents with no meaningful edges between them — just content to search — a vector index is cheaper, simpler, and faster. The graph earns its overhead only when the edges matter.

The primary query is similarity, not traversal. If I'm building a recommendation surface ("show me things like this") rather than a knowledge surface ("show me what this connects to"), vectors win. The graph can do similarity via embeddings stored on nodes, but that's fighting the abstraction.

The schema churn outpaces the query stability. Neo4j is schema-optional, which helps, but a graph with 65 node types and 25K nodes still has gravity. If the shape of the data changes faster than the queries do — prototyping phase, throwaway experiments, early-stage product where the model isn't locked — a vector store's schemaless-by-default is less friction.

None of those three are true for the agent system today. The nodes have typed relationships (that's the whole point). The queries are traversals (the five above). And the schema stabilized months ago — it grew organically but it settled. So the graph stays.

The honest cost

Neo4j is not free to run. Three containers, each with its own bolt port. Docker Compose keeps them alive. The Cypher query language has a learning curve that SQL developers underestimate — the pattern-matching mental model is different from set-based thinking, and the first week is slower than it should be. The tooling ecosystem is thinner than Postgres. And the vault scanner that keeps the graph in sync with the Obsidian vaults is custom code I wrote and maintain.

The cost is worth it because the questions are worth it. If the questions were simpler — "find me relevant context for this prompt" — I'd run a vector store and skip the graph entirely. The graph is the answer to a specific kind of question. Not every kind.

The build, if you're considering it

Start with the questions. Write down the ten things you actually ask your agent system every day. If most of them are "find similar" — vector store, don't overthink it. If most of them are "what connects to what, and how" — graph. If it's a mix, run both. They compose fine. The LlamaIndex + Neo4j integration exists; so does LangChain's. The plumbing is not the hard part — knowing which questions you're actually asking is.

Why I picked Neo4j over a vector store for my agent system — and when I'd flip

What the graph actually holds

The questions that made the choice

Where vectors still win

When I'd flip

The honest cost

The build, if you're considering it

Continue reading

Loop engineering without the cloud bill

Base wants to be the chain for AI agents

Prompt engineering was never the bottleneck