Build Daily

Tinley Park · May 29, 2026

Slice 1 of the voice learning loop is live

This post was drafted by the slice-1 pipeline itself, lightly edited, and published to beta as-is. The structural rubric scored it 0.886 — the staircase-cascade and overclaim flags below are real signal, left visible on purpose.

Slice 1 of the voice learning loop went live today. The eval caught the one lie the model told.

The stack is local. DSPy handles the post_writer module. LlamaIndex handles retrieval over web/content/. A structural eval rubric runs on the output before I look at it. No HTTP surface. No public exposure. Just a pipeline on the laptop. The rubric enforces the editorial bar. The bar is the voice. The voice is the differentiator.

Voice loop pipeline — Slice 1 data flowFive stations in a horizontal flow. A content corpus feeds into LlamaIndex retrieval with nomic-embed and bge-reranker, which feeds into a DSPy post_writer drafter, which feeds into a 10-check structural rubric, which feeds into a human edit step. A feedback loop carries the edit diff back to the training set so the next draft starts closer to publishable.VOICE LOOP · SLICE 1local pipeline · no HTTP surface · the eval is the gate01 · CORPUSweb/content/02 · RETRIEVELlamaIndex · nomic · bge03 · DRAFTDSPy post_writer04 · RUBRIC10-check structural05 · EDIThuman · ship · diffEDIT DIFF → NEXT TRAINING ITERATIONevery edit I make trains the next draft — the loop closes on the first shipFIG. 1 — Local pipeline. DSPy drafts, LlamaIndex retrieves, the rubric gates, the diff teaches.
↗ click to enlarge
The first end-to-end draft scored 0.943. That's 8 of 9 checks passing. The rubric looks at structure. One real claim. Lede in two sentences. Voice not inventory. Honesty without theatricality. Commitment with a date. Plus banned phrases. The rubric catches the drift. The drift is the LLM tic. The tic is the generic AI blog speak. The rubric bans it. The ban works.

One check failed. The drafter used 'enterprise' and 'at-scale' to describe an n=1 personal build. The rubric flagged it. Exactly the LLM tic the editorial rule exists to ban. The model defaulted to generic AI blog speak. It wanted to sound impressive. The rubric doesn't care about impressive. It cares about honest. The rubric corrected the drift before I even opened the file. This is the signal. The eval is doing what evals do. Surfacing the gap.

The compilation took three iterations. DSPy needed to learn the structure from the eval set. The eval set is the last 10 posts. The retrieved context from LlamaIndex provided the domain knowledge. SimpleDirectoryReader ingests the content. nomic-embed-text via Ollama handles embedding. bge-reranker cross-encoder handles retrieval. The combination works. The first draft was usable. Not publishable. Usable. The edit diff will train the next iteration.

The open question is whether structural pass equals voice pass. 8 of 9 is a good start. But voice lives in the details the rubric doesn't measure yet. The fine-tune flywheel in slice 3 will close that gap. Until then, the rubric is the gate. The diff between draft and final is the label. That's the flywheel. Every edit I make trains the next iteration. The loop closes on the first ship. The ship is the proof. The proof is the draft. The draft is the start.

Next step is slice 2. Vale at the seam. Encoding the editorial bar as YAML rules. Running pre-commit. Enforcing the bar even when the model is overconfident. The rubric catches the big drift. Vale catches the micro drift. Together they form the editorial guardrail. The guardrail protects the voice. The voice is the product. The product is the site. The site is the loop.

Vale integration ships by 2026-06-12.

  • #voice-learning-loop
  • #dspy
  • #llamaindex
  • #evals
  • #agents
  • #building-in-public

Continue reading