RAG-less retrieval experiment

abandonedStarted May 8, 2026Finished May 8, 2026

Outcome

Run the neil.godfreylabs.com chat against a 30-question date-anchored eval and confirm whether the no-vector-store retrieval-via-system-prompt approach works at the current corpus size.

Evidence of done

The eval script returns ≥80% accuracy on 30 date-anchored questions with mean latency under 5 seconds. No mid-conversation context truncation observed across the run. Per-question results captured in a dated eval log with the actual numbers.

Method

Mount the corpus read-only into the deployment container (already wired). Concatenate all markdown into the system prompt at startup (already implemented). Run the eval harness. Capture per-question accuracy + latency. Single pass; no retraining mid-eval.

Stopping conditions

Abandoned if accuracy <60% on any sub-cohort (recent, week-old, month-old) — that means the approach has a real failure mode. Abandoned if context truncation happens consistently before the eval completes. Revised if midway I realize the eval set itself is wrong (questions ambiguous) — close as revised, write new outcome with fixed eval.

Result

Abandoned at scope before the eval ran. The framing of the parent post wasn't earned in voice — the central claim sat on a clever frame I couldn't fully back. Without the post, the eval was orphaned scope. The simpler engineering still works at the corpus's current scale; it just doesn't need a contrarian post built around it.

Learning

The writing system caught a voice violation before anything shipped. Future outcomes need a parent-post voice-ownership check at scope time — the framing has to be earned, not borrowed. Cheap abandonment cost — no public artifacts shipped, no link rot. Worth more than the eval would have been.