Neo4j
The canonical-state graph database — Behavior nodes, Topic nodes, MemoryEntry log, ~536 typed nodes and ~1,256 relationships from the GodfreyLabs scanner today. Free community edition; runs as a Docker container; Cypher query language. The memory layer every GL agent reads before answering anything load-bearing.
Neo4j is the layer that lets a GL agent remember what's true now — not what got logged at a moment in time, but the canonical state any agent can read before acting. Every agent on the stack consults it. This page is the orient-and-anchor surface — official docs at neo4j.com/docs own the Cypher reference.
What it is
A native graph database — data is stored as nodes with labels (types) and properties, connected by directed relationships that also have types and properties. Queries are written in Cypher, a declarative pattern-matching language. Free community edition (GPLv3); enterprise edition is paid (skip).
Distribution: official Docker image, Homebrew formula, or direct download. Runs as a single process — bolt protocol on :7687 for clients, HTTP/Browser UI on :7474. The GL deployment runs three containers — gl-graph-godmode (canonical), gl-graph-oracle (Sage readings), gl-graph-truthers (truthers project) — each on its own port set.
The pitch: when the questions you ask are relational ("which projects share a person?"), temporal ("what did Neil say about X first?"), or canonical ("what's the current state of Y?"), a graph beats a vector store and a relational table by a wide margin. The wrong primitive in a vector store is "find similar"; the wrong primitive in SQL is "join until your eyes bleed." A graph natively models both.
When to use it
Reach for it when:
- The data is inherently graph-shaped — people, projects, ideas, protocols, goals, with rich cross-relationships.
- You need canonical state with conflict resolution — "Topic node wins over append-only log" is a real pattern.
- Retrieval is relational or temporal first, similarity-second. "Show me everything Neil decided about Pendle in the last month" is a graph query, not an embedding query.
- The corpus has personalization layers — Behavior nodes that change agent decisions per user. Hard to do cleanly in pure-text RAG.
- You want a single source of truth that multiple agents and processes can read and write consistently.
Skip it when:
- The workload is pure document RAG — a flat list of passages and a query. LlamaIndex + a vector store is the right shape.
- The team has no graph-modeling experience and the data shape is genuinely tabular. Forcing a graph on tabular data adds cost with no win.
- You don't need canonical state — append-only logs and "always re-query the source" cover it.
At a glance
Core concepts
- Node — an entity. Labeled with one or more types (
:Person,:Project,:Behavior). - Relationship — a directed, typed edge between nodes (
(:Person)-[:WORKS_ON]->(:Project)). - Property — key/value on a node or relationship.
- Cypher — the query language. ASCII-art pattern matching:
MATCH (n:Person {name:"Neil"})-[:USES]->(p:Protocol) RETURN p. - Constraint — uniqueness or existence guarantees on a label/property.
agent_name_uniqueis a GL example — prevents the case-duplicate bug from hand-rolled MERGEs. - Index — on a label/property to make pattern lookups fast.
The GL deployment shape
| Container | Bolt port | HTTP port | Purpose |
|---|---|---|---|
gl-graph-godmode |
7687 | 7474 | Canonical — Behavior, Idea, Goal, Person, Project, Topic, Session, MemoryEntry. The default target. |
gl-graph-oracle |
7690 | 7477 | Sage readings — separate plumbing keeps reading state isolated from canonical state. |
gl-graph-truthers |
7688 | 7475 | Truthers project — paused but live. |
All share neo4j / godmode2026 for local development. Production hardening (real password, TLS, network restrictions) is per-environment.
Node types in active use
:Behavior— rules of engagement. "Don't recommend long veAERO locks." Loaded every session.:Topic— canonical state for a coherent thread. "Active GL focus" wins over any conflicting log entry.:MemoryEntry— append-only episodic log with timestamps. Searchable via FTS.:Person,:Project,:Goal,:Protocol,:Idea,:Session,:Agent— domain entities.
How to integrate
Default integration for a new GL surface:
- Use the existing canonical container.
gl-graph-godmodeis the default target. Don't spin up a new database unless the surface needs isolated state. - Wire the Python client.
pip install neo4j→from neo4j import GraphDatabase→ driver targetsbolt://localhost:7687. For CLI work,cypher-shellships with the Neo4j Docker image. - Read before write. Topic nodes win over logs on conflict — load the Topic at session start; don't infer state from a log query.
- Write via the sanctioned helpers, not raw MERGEs. The
agent_name_uniqueconstraint exists because hand-rolled MERGEs with inconsistent casing caused a case-duplicate bug. Use the helper scripts where they exist. - Log corrections to MemoryEntry; update Topic; link them. The pattern: episodic entry + canonical update + a relationship between them. Don't just append; don't just update.
- Cypher style. Pattern-first. Use
MATCH ... RETURNfor reads,MERGEfor upsert,CREATEonly when you know the node doesn't exist.EXPLAINandPROFILEshow the planner's choices when a query is slow.
In the GL stack
builddaily.io
- Behavior nodes loaded every session. Rules of engagement, voice constraints, scope boundaries — the canonical "don't do X" surface. Agent reads them before responding to anything load-bearing.
- Topic node for active focus. "gl-active-focus-three-projects" is a Topic that resolves which projects are in scope; conflicts with logs are won by the Topic.
- MemoryEntry log for episodic recall. "When did Neil first mention X" is a Cypher full-text query.
- Behavior nodes capture qualitative drafter corrections (slice 2 of the agent-stack post). Every "no, not that — write it like this" becomes a Behavior the next compile reads.
paiddaily.io
- Person + Protocol + Position graph. Wallet addresses link to people; people link to projects; projects link to positions in protocols. "Who has exposure to this Pendle market" is a single Cypher query.
- Catalyst lineage. Pendle catalysts as nodes with
:PRECEDESrelationships when one catalyst sets up another. Graph beats SQL for "show me the chain that led to today's setup."
sagedaily.io
- Per-user state. Standing intention, chart, cycle, prior readings — all anchored on a
:Personnode. Each reading's:OracleReadingnode links back to inputs and:Cards drawn. - Reading lineage. Sequential readings link via
:FOLLOWED_BY; archetypes that recur thread through:RESONATES_WITHrelationships. "What's coming up for you" is a literal pattern match.
Gotchas
- Cypher is not SQL. Joining feels like "extending the pattern"; thinking SQL-first produces awkward queries that the planner can't optimize. Learn the patterns.
- MERGE without uniqueness constraints silently creates duplicates. The
agent_name_uniqueconstraint exists because of a 2026-05-04 case-duplicate bug. Every entity type that's MERGE'd by a string key needs a uniqueness constraint. - Property explosion is real. Don't dump arbitrary JSON onto a node. Properties should be queryable; deeply-nested data belongs in linked nodes or a separate store.
- Bolt protocol is binary, not HTTP. Don't try to hit
:7687withcurl; use a Bolt driver. Browser UI on:7474is for exploration. - Backup is not automatic. Community edition lacks online backup tooling.
neo4j-admin database dumpruns offline. Schedule it.
Risks
- Single-vendor open-core. Neo4j Inc. drives the project. Community edition GPLv3; enterprise paid features (clustering, online backup, advanced security) are a real upgrade path the GL stack doesn't need today.
- Operational footprint. Containerized Neo4j wants ~1-2GB RAM and meaningful disk for a non-trivial graph. Budget the box accordingly.
- Cypher learning curve. Steeper than SQL for some patterns. The investment pays back fast once the relational/temporal queries that were painful in SQL become one-liners.
Related
- LlamaIndex — handles document retrieval. Neo4j handles canonical state. Two systems, two jobs; both consulted at agent answer time.
- Langfuse — captures quantitative traces. Neo4j captures qualitative state (Behavior nodes, Topics). No duplication.
