Slice 1 of the voice learning loop is live

Sunday, May 24, 2026

This post was drafted by the slice-1 pipeline itself, lightly edited, and published to beta as-is. The structural rubric scored it 0.886 — the staircase-cascade and overclaim flags below are real signal, left visible on purpose.

Slice 1 of the voice learning loop went live today. The eval caught the one lie the model told.

The stack is local. DSPy handles the post_writer module. LlamaIndex handles retrieval over web/content/. A structural eval rubric runs on the output before I look at it. No HTTP surface. No public exposure. Just a pipeline on the laptop. The rubric enforces the editorial bar. The bar is the voice. The voice is the differentiator.

↗ click to enlarge

The first end-to-end draft scored 0.943. That's 8 of 9 checks passing. The rubric looks at structure. One real claim. Lede in two sentences. Voice not inventory. Honesty without theatricality. Commitment with a date. Plus banned phrases. The rubric catches the drift. The drift is the LLM tic. The tic is the generic AI blog speak. The rubric bans it. The ban works.

One check failed. The drafter used 'enterprise' and 'at-scale' to describe an n=1 personal build. The rubric flagged it. Exactly the LLM tic the editorial rule exists to ban. The model defaulted to generic AI blog speak. It wanted to sound impressive. The rubric doesn't care about impressive. It cares about honest. The rubric corrected the drift before I even opened the file. This is the signal. The eval is doing what evals do. Surfacing the gap.

The compilation took three iterations. DSPy needed to learn the structure from the eval set. The eval set is the last 10 posts. The retrieved context from LlamaIndex provided the domain knowledge. SimpleDirectoryReader ingests the content. nomic-embed-text via Ollama handles embedding. bge-reranker cross-encoder handles retrieval. The combination works. The first draft was usable. Not publishable. Usable. The edit diff will train the next iteration.

The open question is whether structural pass equals voice pass. 8 of 9 is a good start. But voice lives in the details the rubric doesn't measure yet. The fine-tune flywheel in slice 3 will close that gap. Until then, the rubric is the gate. The diff between draft and final is the label. That's the flywheel. Every edit I make trains the next iteration. The loop closes on the first ship. The ship is the proof. The proof is the draft. The draft is the start.

Next step is slice 2. Vale at the seam. Encoding the editorial bar as YAML rules. Running pre-commit. Enforcing the bar even when the model is overconfident. The rubric catches the big drift. Vale catches the micro drift. Together they form the editorial guardrail. The guardrail protects the voice. The voice is the product. The product is the site. The site is the loop.

Vale integration ships by 2026-06-12.

Slice 1 of the voice learning loop is live

Continue reading

Teaching an agent to draft in my voice

When to fine-tune an LLM — and when to skip it

The voice gate that grades me, too