The voice gate that grades me, too
A while back I built a rubric to grade the agent that drafts posts for me, and wrote it up in slice 1 of the voice learning loop. The surprise, a month of posts later, is that the rubric improved my own writing more than the agent's.
It did that by quietly changing jobs. It was built to score a machine's draft. It's now a gate on every post that enters the repo — and most of those, lately, I wrote by hand.
From grading the bot to gating every commit
The original rubric had one reader: the DSPy drafter's output. It scored a draft so I could see whether the agent was drifting into generic AI blog-speak before I read a word of it.
Then I wired it into the commit hook. Now every post — agent-drafted or typed by me at midnight — gets the same score before it can ship, plus a pass from Vale, a prose linter, on top. The bar stopped being "is the machine writing like me" and became "is this post writing like me," no matter who held the keyboard. That reframing is the whole story. A standard that only judges the machine drifts the moment a human takes over; a standard that gates every commit holds the line on both.
What it actually checks
The rubric is a set of named structural rules, each one an AI tell turned into a check that has to pass:
- Lede in two sentences. Get to the point; no four-sentence throat-clearing.
- No banned openers, no banned verbs. The corporate throat-clears and power-verbs sit on a blocklist. (I can't print the worst offender as my example here — the gate would flag the post you're reading. Which is the point.)
- No "X, not Y" inversions. That breathless "it's not about A, it's about B" cadence is the single loudest model tic. Banned.
- No overclaim words. "Revolutionary," "seamless," "powerful" — cut or earn them.
- Em-dash density under a cap. Models love an em-dash. Past a threshold per two hundred words, it reads like a machine, so the count is enforced.
- No staircase cascade. Runs of short, identically-shaped sentences — the "It works. The bar is the voice. The voice is the edge." pattern — get flagged.
- No internal paths or repo names in reader-facing copy.
Vale handles the prose layer: weasel words (the soft hedges and intensifiers), passive voice, sentences that run too long. Between them they cover the structure and the texture.
Why grading myself is the real win
Here's what actually happens now. I write a post in my own voice, I commit it, and the gate flags something almost every time: too many em-dashes, a hedge word I didn't notice, a staircase run that crept into a list. I fix it, and only then does it ship. The loop is write → gate → tighten → ship clean, and I run it on myself exactly the way the agent runs it on its drafts.
That's the part that improved the writing you've been reading. Not a smarter model and not a clever prompt — a checklist I can't talk my way past, applied to every post before it goes out. My ear gets tired and lets a tic through; the rubric never does.
It's rules, not a model
The gate is deterministic Python plus Vale. No language model grading the writing — partly because a model judging its own output is circular, and partly because rules give the same score every run, for free. The number doesn't drift, doesn't cost a token, and doesn't have an opinion. It just checks. This is the same idea as any other eval — you can't optimize what you can't measure — pointed at prose instead of model output.
The honest limit
A structural rubric checks the shape of the writing, not the worth of the idea. It will happily pass a well-formed post about nothing. It stops the prose from reading like every other AI blog; it can't make a dull point interesting. So it's a floor, not a ceiling — it guarantees the voice, and leaves the quality of the thinking to me. That's the right division of labor. The machine-tells are mechanical, so I automated them; the thinking isn't, so I didn't.
