Teaching an agent to draft in my voice

Sunday, May 24, 2026

There's a chat at the top of this site that answers from my markdown when it works. It's the simplest possible retrieval surface — a couple hundred lines of Python that walks every .md file, concatenates the lot into the system prompt, and asks an Ollama-hosted model to read. It'll stay that simple. It's not where the next-step build is.

What I'm actually building is an agent that drafts the next post for me, in my voice, grounded in everything I've already written, getting closer to my edit-target every cycle. I'm the only writer on this site and I'll be the only one for a while — so this is a voice fidelity story, not a scale story. Can the drafter get close enough that the marginal post takes me twenty minutes of editing instead of two hours of writing. That's the bar.

The frame

Three layered tools, each free, each local, each earning its keep at a different layer.

↗ click to enlarge

What's already running

The brain layer is where I have receipts. About two dozen DSPy modules in personal use across Sage and the rest of the GodfreyLabs stack — readers, reflectors, theme extractors, recapers, deflectors, hashtag writers, a cover-letter writer that one of the agents already runs for capital pitches and proposals. Pattern is the same every time. A Signature names the contract, a small handful of examples seed the optimizer, the optimizer compiles a prompt that ships. One config line away from Claude or local Ollama — same module, either backend.

The drafter is the natural sibling — a post_writer module next to the proposal_writer already running. Signature: (content_seed, retrieved_context, post_shape) → (draft, suggested_title, commitment_with_date). The Signature is the spec; everything below is execution.

What's missing

Two things. I haven't shipped LlamaIndex in my stack yet — saying it once and moving on. And I haven't fine-tuned anything in personal use yet either. The post that committed to thinking about fine-tuning shipped last week. This post commits to actually doing it, on a corpus I already own — this site.

Three slices, in order

The drafter, end to end. A local Python pipeline — SimpleDirectoryReader over web/content/ → nomic-embed-text via Ollama → file-backed vector index → a bge-reranker cross-encoder → a DSPy post_writer module compiled against my last 10–15 posts as an eval set. No HTTP surface, no public exposure, no traffic on the laptop. Input: a content seed. Output: a draft I review and edit before merging. The diff between draft and final is the next iteration's training pair. First post on this site drafted through this pipeline ships by 2026-06-15.
Vale at the seam. Quietest of the three. Encodes the editorial bar this site already runs by — one real claim, lede in two sentences, voice not inventory, honesty without theatricality, commitment with a date — as YAML rules. Plus the banned phrase fingerprints, disclaimer caps, retired openers. Runs pre-commit on the writer's machine; runs in CI on every PR to beta. The bar gets enforced even when I'm tired and the drafter is overconfident. Ships with slice 1.
The fine-tune flywheel. Every (content_seed, retrieved_context) → (final_post) pair from slice 1's editing loop is a label. Once ~30 pairs accumulate, I fine-tune Llama 3.1 8B on them via unsloth on Google Colab's free tier — QLoRA on a T4, no API bill, no paid infra. Swap the fine-tuned adapter into the same DSPy LM config; the drafter inherits the new weights without changing a line of orchestration code. Activates once N≥30 edits.

Each slice earns its keep on its own. The drafter is useful before the fine-tune kicks in. Vale catches drift even when the model is fine. And each layer can be ripped out independently if a better tool shows up — DSPy modules don't care which LM is on the other end; LlamaIndex doesn't care which embedder it calls; the fine-tuned model is a swap, not a rewrite.

How I'll know it's working

The fine-tune post put it plainly: optimizing without an eval is profiling without numbers. The same rule applies to every slice below, not just the last one.

Each slice gets its own measurable bar. The eval lives in the repo next to the code it scores — versioned, regression-tested, queryable.

Eval plan · 3 slicesEach slice has a signal that says "did this earn its keep?"

01
The drafter
Draft hits the editorial bar; voice matches the corpus.
Signal ▸two metrics, both computed per draft. Structural: a rubric scoring each editorial rule (lede in two sentences? commitment with a date? earned specific present?). Voice: LLM-as-judge pairwise vs a held-out post in my voice. Eval set is my last 10–15 posts, each labeled with the seed that would have produced it. DSPy Evaluate at compile; same metric at every draft run. Plus the outcome-level signal: did the next post on this site actually start as the agent's draft.
02
Vale at the seam
Rules fire on known-bad lines, stay silent on known-good ones.
Signal ▸each Vale rule ships with a positive and negative test pair. Rule-as-tests in the repo. A rule that drifts (fires too often, or stops catching its target) shows up in CI before it shows up in a draft.
03
Fine-tune flywheel
Fine-tuned model beats the un-fine-tuned model on the slice-1 eval.
Signal ▸A/B on the slice-1 eval set. Same DSPy module, two backends: vanilla Llama 3.1 8B vs the fine-tune. The fine-tune ships only if it wins on structural and voice metrics by a margin worth the swap. Track the delta over time — each new fine-tune is one observation on the flywheel-is-working trace.

Where the agent sits

The drafter doesn't need a new agent — it slots into the assistant system that already runs the site. Eight named agents, each with defined triggers, brain modules, and a project board; one of them owns content. The drafter is one more brain module under that umbrella, next to the proposal writer already running. A separate engineering agent drops content seeds from real work; the content agent picks them up, drafts, and ships. The site's draft-to-beta-to-main workflow is the output pipe.

A new brain module under an agent that already owns the lane. Not a new agent.

What ships next

The commitment: the next post on this site starts as the agent's draft, by 2026-06-15. Whether it ships with a lot of my editing on top or a little is the open question — but the seed and the first pass come from the pipeline above, not from me opening a blank file. From then on, every post starts that way.

When the drafter is good enough that the editing is closer to twenty minutes than two hours, the post making that claim will say so up top.

Teaching an agent to draft in my voice

The frame

What's already running

What's missing

Three slices, in order

How I'll know it's working

Where the agent sits

What ships next

Read next — the depth posts behind each layer

Continue reading

Slice 1 of the voice learning loop is live

When to fine-tune an LLM — and when to skip it

Smith: the coding agent learning to own my SDLC