Make the Personal Agent shippable — kicking off

Friday, May 8, 2026kickoffproject: digital-twin

Get the chat at neil.godfreylabs.com to a quality bar I'd be glad to share publicly.

The chat is real — a working agent with public access through a Cloudflare Tunnel, reading my markdown corpus through a deployment container that mounts the notes and journal directories read-only. The declarative LLM responder module is live. Ollama runs the inference locally. What's missing isn't engineering — it's operational quality. The chat would happily answer a visitor today, but I haven't audited what it might answer with. Embarrassing answers are worse than no demo.

The plan is small and ordered. First: read every markdown file currently in the corpus and decide what's appropriate for public viewing. Strip or relocate anything that isn't — business specifics, NDA-bound work, anything I'd be uncomfortable having a stranger read. Second: write 10–20 representative questions ("what has Neil been working on?" / "tell me about the DeFi project" / "what's Neil's stack?"), run them through the live chat, score the answers for accuracy and voice. If the baseline is poor, light prompt tuning and responder-module tightening go next. Third: when the bar is met, announce it — a journal entry that links the chat as a thing to try, the URL shared with a few outside readers, and let it live.

The cost is bounded. One to two weeks of focused time, most of it in the privacy review and the quality loop. The infra is paid for; the announce write-up itself is a thirty-minute task once the chat is ready. The downside if I rush and ship bad answers is asymmetric: a first-time visitor who hits a weird response remembers the weirdness, not the architecture. The privacy review and the quality bar are the mitigation, both before going public.

I'd walk away cleanly if quality can't be brought above the ready to share threshold within two weeks. If a specific question type — for example, older logs — is the only failure mode, the outcome revises into a tighter scope: recent-content-only, or chunked retrieval added for older months. Either way, something concrete enough to put on its feet.

What I'm watching for: whether the quality gap is prompt-shaped or retrieval-shaped. If light prompt tuning closes it, the simple architecture holds and the announce can lead with that. If it doesn't, the chat gets chunked retrieval and the announce leads with the harder engineering — a learning, not a defeat. Either way, the chat joins the journal, the projects, and the posts as one more node in the build-in-public network — pieces of actual work, linked together, that point at what I'm doing rather than asking anyone to take my word for it.