Replace Claude Code with Pi — kicking off

Friday, May 8, 2026kickoffproject: pi-coding-backup

Replace Claude Code with a local Pi coding agent.

Two threads pull this outcome together. The first is interest — local AI is the most interesting place to build right now, and not having my own coding harness running on it feels like missing the moment. The second is autonomy. Depending solely on any one company, model, or provider — even one as good as Anthropic — is a position of weakness. Some will argue the world is in both a memory and compute shortage with the cost of both going up; running my own coding loop on local hardware is one way to take a side.

The intelligence layer is solved. Ollama runs qwen3:27b on the heavy tier; voice-tier routing already sends harder tasks to Kimi K2.6 cloud when the context calls for it. What's missing is the harness — the program that uses that intelligence the way Claude Code uses Anthropic's API. That harness is the Pi coding agent.

Claude Code is powerful, and I'll keep using it for some tasks indefinitely. But two things make me want my own. When I have an existing structure laid out — declarative LLM modules, locked behaviors, the multi-agent system that Smith and the others coordinate inside — Claude Code sometimes ignores it and does whatever it wants, and I push back mid-session to keep the work in the lanes I've set. The Pi coding agent will be deliberately minimalistic, with the developer in command of behavior — read my locked behaviors, follow my Smith specs, route through my voice-tier, exit cleanly when something is outside the locked scope. The other thing is cost: ~$200/mo of subscription becomes ~$0 — a real fraction of monthly burn worth closing.

The build sequence is eval-first. Capture 30 representative coding tasks from the last two weeks — refactors, debugs, scaffolds, code reviews — with saved Claude Code outputs as the baseline. Then build coding-agent modules on the local heavy tier using declarative LLM programming — architect, reviewer, writer signatures parallel to the patterns already in place. Then wire a CLI that mimics Claude Code's surface (file edits, bash, grep, plan mode). Trial it against daily work for two weeks, no Claude Code. Cancel decision by 2026-06-15.

If the eval shows less than 60% parity at the end of the build phase, the outcome is abandoned — Claude Code stays, and the build pivots into a writeup of what didn't work and why. If a specific subset (long-context refactor, for example) is solvable but the rest isn't, the outcome is revised — Pi handles its share, Claude Code keeps its share, and the cost cut becomes partial. Either path produces a result worth shipping.

What I'm watching for: whether the autonomy thesis holds up under real daily work. If I find myself reaching for Claude Code anyway in the trial window, the question shifts from "is the harness good enough" to "do I actually value the sovereignty I claimed." That's a worth-knowing answer either way.