Teaching an agent to draft in my voice
There's a chat at the top of this site that answers from my markdown when it works. It's the simplest possible retrieval surface — a couple hundred lines of Python that walks every .md file, concatenates the lot into the system prompt, and asks an Ollama-hosted model to read. It'll stay that simple. It's not where the next-step build is.
What I'm actually building is an agent that drafts the next post for me, in my voice, grounded in everything I've already written, getting closer to my edit-target every cycle. I'm the only writer on this site and I'll be the only one for a while — so this is a voice fidelity story, not a scale story. Can the drafter get close enough that the marginal post takes me twenty minutes of editing instead of two hours of writing. That's the bar.
The frame
Three layered tools, each free, each local, each earning its keep at a different layer.
What's already running
The brain layer is where I have receipts. About two dozen DSPy modules in personal use across Sage and the rest of the GodfreyLabs stack — readers, reflectors, theme extractors, recapers, deflectors, hashtag writers, a cover-letter writer that one of the agents already runs for capital pitches and proposals. Pattern is the same every time. A Signature names the contract, a small handful of examples seed the optimizer, the optimizer compiles a prompt that ships. One config line away from Claude or local Ollama — same module, either backend.
The drafter is the natural sibling — a post_writer module next to the proposal_writer already running. Signature: (content_seed, retrieved_context, post_shape) → (draft, suggested_title, commitment_with_date). The Signature is the spec; everything below is execution.
What's missing
Two things. I haven't shipped LlamaIndex in my stack yet — saying it once and moving on. And I haven't fine-tuned anything in personal use yet either. The post that committed to thinking about fine-tuning shipped last week. This post commits to actually doing it, on a corpus I already own — this site.
Three slices, in order
The drafter, end to end. A local Python pipeline —
SimpleDirectoryReaderoverweb/content/→nomic-embed-textvia Ollama → file-backed vector index → abge-rerankercross-encoder → a DSPypost_writermodule compiled against my last 10–15 posts as an eval set. No HTTP surface, no public exposure, no traffic on the laptop. Input: a content seed. Output: a draft I review and edit before merging. The diff between draft and final is the next iteration's training pair. First post on this site drafted through this pipeline ships by 2026-06-15.Vale at the seam. Quietest of the three. Encodes the editorial bar this site already runs by — one real claim, lede in two sentences, voice not inventory, honesty without theatricality, commitment with a date — as YAML rules. Plus the banned phrase fingerprints, disclaimer caps, retired openers. Runs pre-commit on the writer's machine; runs in CI on every PR to
beta. The bar gets enforced even when I'm tired and the drafter is overconfident. Ships with slice 1.The fine-tune flywheel. Every
(content_seed, retrieved_context) → (final_post)pair from slice 1's editing loop is a label. Once ~30 pairs accumulate, I fine-tune Llama 3.1 8B on them via unsloth on Google Colab's free tier — QLoRA on a T4, no API bill, no paid infra. Swap the fine-tuned adapter into the same DSPy LM config; the drafter inherits the new weights without changing a line of orchestration code. Activates once N≥30 edits.
Each slice earns its keep on its own. The drafter is useful before the fine-tune kicks in. Vale catches drift even when the model is fine. And each layer can be ripped out independently if a better tool shows up — DSPy modules don't care which LM is on the other end; LlamaIndex doesn't care which embedder it calls; the fine-tuned model is a swap, not a rewrite.
How I'll know it's working
The fine-tune post put it plainly: optimizing without an eval is profiling without numbers. The same rule applies to every slice below, not just the last one.
Each slice gets its own measurable bar. The eval lives in the repo next to the code it scores — versioned, regression-tested, queryable.
Where the agent sits
The drafter doesn't need a new agent — it slots into the assistant system that already runs the site. Eight named agents, each with defined triggers, brain modules, and a project board; one of them owns content. The drafter is one more brain module under that umbrella, next to the proposal writer already running. A separate engineering agent drops content seeds from real work; the content agent picks them up, drafts, and ships. The site's draft-to-beta-to-main workflow is the output pipe.
A new brain module under an agent that already owns the lane. Not a new agent.
What ships next
The commitment: the next post on this site starts as the agent's draft, by 2026-06-15. Whether it ships with a lot of my editing on top or a little is the open question — but the seed and the first pass come from the pipeline above, not from me opening a blank file. From then on, every post starts that way.
When the drafter is good enough that the editing is closer to twenty minutes than two hours, the post making that claim will say so up top.
Read next — the depth posts behind each layer
- An agent stack on local hardware — the four hard parts — the stack this drafter sits on, depth piece.
- When to fine-tune an LLM — and when to skip it — the decision matrix slice 3 is the yes for.
Each one earns its keep alone. Together they're the build log for what ships above.
