DatabaseNeo4j, Inc.applied

Neo4j

The canonical-state graph database — Behavior nodes, Topic nodes, MemoryEntry log, ~536 typed nodes and ~1,256 relationships from the GodfreyLabs scanner today. Free community edition; runs as a Docker container; Cypher query language. The memory layer every GL agent reads before answering anything load-bearing.

Updated May 24, 2026

Neo4j is the layer that lets a GL agent remember what's true now — not what got logged at a moment in time, but the canonical state any agent can read before acting. Every agent on the stack consults it. This page is the orient-and-anchor surface — official docs at neo4j.com/docs own the Cypher reference.

What it is

A native graph database — data is stored as nodes with labels (types) and properties, connected by directed relationships that also have types and properties. Queries are written in Cypher, a declarative pattern-matching language. Free community edition (GPLv3); enterprise edition is paid (skip).

Distribution: official Docker image, Homebrew formula, or direct download. Runs as a single process — bolt protocol on :7687 for clients, HTTP/Browser UI on :7474. The GL deployment runs three containers — gl-graph-godmode (canonical), gl-graph-oracle (Sage readings), gl-graph-truthers (truthers project) — each on its own port set.

The pitch: when the questions you ask are relational ("which projects share a person?"), temporal ("what did Neil say about X first?"), or canonical ("what's the current state of Y?"), a graph beats a vector store and a relational table by a wide margin. The wrong primitive in a vector store is "find similar"; the wrong primitive in SQL is "join until your eyes bleed." A graph natively models both.

When to use it

Reach for it when:

The data is inherently graph-shaped — people, projects, ideas, protocols, goals, with rich cross-relationships.
You need canonical state with conflict resolution — "Topic node wins over append-only log" is a real pattern.
Retrieval is relational or temporal first, similarity-second. "Show me everything Neil decided about Pendle in the last month" is a graph query, not an embedding query.
The corpus has personalization layers — Behavior nodes that change agent decisions per user. Hard to do cleanly in pure-text RAG.
You want a single source of truth that multiple agents and processes can read and write consistently.

Skip it when:

The workload is pure document RAG — a flat list of passages and a query. LlamaIndex + a vector store is the right shape.
The team has no graph-modeling experience and the data shape is genuinely tabular. Forcing a graph on tabular data adds cost with no win.
You don't need canonical state — append-only logs and "always re-query the source" cover it.

At a glance

Core concepts

Node — an entity. Labeled with one or more types (:Person, :Project, :Behavior).
Relationship — a directed, typed edge between nodes ((:Person)-[:WORKS_ON]->(:Project)).
Property — key/value on a node or relationship.
Cypher — the query language. ASCII-art pattern matching: MATCH (n:Person {name:"Neil"})-[:USES]->(p:Protocol) RETURN p.
Constraint — uniqueness or existence guarantees on a label/property. agent_name_unique is a GL example — prevents the case-duplicate bug from hand-rolled MERGEs.
Index — on a label/property to make pattern lookups fast.

The GL deployment shape

Container	Bolt port	HTTP port	Purpose
`gl-graph-godmode`	7687	7474	Canonical — Behavior, Idea, Goal, Person, Project, Topic, Session, MemoryEntry. The default target.
`gl-graph-oracle`	7690	7477	Sage readings — separate plumbing keeps reading state isolated from canonical state.
`gl-graph-truthers`	7688	7475	Truthers project — paused but live.

All share neo4j / godmode2026 for local development. Production hardening (real password, TLS, network restrictions) is per-environment.

Node types in active use

:Behavior — rules of engagement. "Don't recommend long veAERO locks." Loaded every session.
:Topic — canonical state for a coherent thread. "Active GL focus" wins over any conflicting log entry.
:MemoryEntry — append-only episodic log with timestamps. Searchable via FTS.
:Person, :Project, :Goal, :Protocol, :Idea, :Session, :Agent — domain entities.

How to integrate

Default integration for a new GL surface:

Use the existing canonical container. gl-graph-godmode is the default target. Don't spin up a new database unless the surface needs isolated state.
Wire the Python client. pip install neo4j → from neo4j import GraphDatabase → driver targets bolt://localhost:7687. For CLI work, cypher-shell ships with the Neo4j Docker image.
Read before write. Topic nodes win over logs on conflict — load the Topic at session start; don't infer state from a log query.
Write via the sanctioned helpers, not raw MERGEs. The agent_name_unique constraint exists because hand-rolled MERGEs with inconsistent casing caused a case-duplicate bug. Use the helper scripts where they exist.
Log corrections to MemoryEntry; update Topic; link them. The pattern: episodic entry + canonical update + a relationship between them. Don't just append; don't just update.
Cypher style. Pattern-first. Use MATCH ... RETURN for reads, MERGE for upsert, CREATE only when you know the node doesn't exist. EXPLAIN and PROFILE show the planner's choices when a query is slow.

In the GL stack

builddaily.io

Behavior nodes loaded every session. Rules of engagement, voice constraints, scope boundaries — the canonical "don't do X" surface. Agent reads them before responding to anything load-bearing.
Topic node for active focus. "gl-active-focus-three-projects" is a Topic that resolves which projects are in scope; conflicts with logs are won by the Topic.
MemoryEntry log for episodic recall. "When did Neil first mention X" is a Cypher full-text query.
Behavior nodes capture qualitative drafter corrections (slice 2 of the agent-stack post). Every "no, not that — write it like this" becomes a Behavior the next compile reads.

paiddaily.io

Person + Protocol + Position graph. Wallet addresses link to people; people link to projects; projects link to positions in protocols. "Who has exposure to this Pendle market" is a single Cypher query.
Catalyst lineage. Pendle catalysts as nodes with :PRECEDES relationships when one catalyst sets up another. Graph beats SQL for "show me the chain that led to today's setup."

sagedaily.io

Per-user state. Standing intention, chart, cycle, prior readings — all anchored on a :Person node. Each reading's :OracleReading node links back to inputs and :Cards drawn.
Reading lineage. Sequential readings link via :FOLLOWED_BY; archetypes that recur thread through :RESONATES_WITH relationships. "What's coming up for you" is a literal pattern match.

Gotchas

Cypher is not SQL. Joining feels like "extending the pattern"; thinking SQL-first produces awkward queries that the planner can't optimize. Learn the patterns.
MERGE without uniqueness constraints silently creates duplicates. The agent_name_unique constraint exists because of a 2026-05-04 case-duplicate bug. Every entity type that's MERGE'd by a string key needs a uniqueness constraint.
Property explosion is real. Don't dump arbitrary JSON onto a node. Properties should be queryable; deeply-nested data belongs in linked nodes or a separate store.
Bolt protocol is binary, not HTTP. Don't try to hit :7687 with curl; use a Bolt driver. Browser UI on :7474 is for exploration.
Backup is not automatic. Community edition lacks online backup tooling. neo4j-admin database dump runs offline. Schedule it.

Risks

Single-vendor open-core. Neo4j Inc. drives the project. Community edition GPLv3; enterprise paid features (clustering, online backup, advanced security) are a real upgrade path the GL stack doesn't need today.
Operational footprint. Containerized Neo4j wants ~1-2GB RAM and meaningful disk for a non-trivial graph. Budget the box accordingly.
Cypher learning curve. Steeper than SQL for some patterns. The investment pays back fast once the relational/temporal queries that were painful in SQL become one-liners.

Alternatives · 5 substitutesPick Neo4j unless one of these wins on your specific brief.

01
Postgres with recursive CTEs
Relational store + WITH RECURSIVE for graph traversal.
Wins when ▸the team is already on Postgres and the graph depth is shallow (≤ 3 hops). Recursive CTEs degrade with depth; the planner doesn't think in patterns. Fine for relationships as a feature; painful when graphs are the model.
02
Memgraph
In-memory, Cypher-compatible, drop-in for many Neo4j workloads.
Wins when ▸latency is the constraint and the graph fits in RAM. Same Cypher dialect; different vendor; smaller ecosystem. Worth A/B if Neo4j read latency becomes a bottleneck.
03
SQLite + JSON columns
File-backed relational with semi-structured payloads.
Wins when ▸the project is single-process, tiny graph, no concurrent writers. Zero ops surface. Graduates to Neo4j when relationships become first-class.
04
ArangoDB
Multi-model — document + graph + key/value in one engine.
Wins when ▸the workload genuinely needs all three models and you don't want three engines. AQL query language; different mental model than Cypher. Smaller community than Neo4j.
05
Vector store alone
pgvector / Chroma / Qdrant as the only retrieval surface.
Wins when ▸the questions are all "find similar" — purely semantic. Falls down on "who is connected to whom" or "what's true now." Best as a complement to a graph, not a replacement.

LlamaIndex — handles document retrieval. Neo4j handles canonical state. Two systems, two jobs; both consulted at agent answer time.
Langfuse — captures quantitative traces. Neo4j captures qualitative state (Behavior nodes, Topics). No duplication.