Pi + qwen3.6 as backup when Claude Code limits hit
Outcome
A working local coding setup (Pi + Ollama qwen3.6) tested and ready as a backup whenever weekly Claude Code Max limits hit. Not a replacement; a second seat. The math doesn't favor cancellation right now — $200/mo for Max is cheap relative to the time to build a parity stack.
Evidence of done
When weekly Claude Code limits hit, Pi + qwen3.6 carries the work to next reset. Output quality is acceptable on review. Pattern that works (one Pi terminal, small chunks, spec-driven) is documented and reproducible.
Method
When Claude Code limits hit: open one Pi terminal (two locks up), use Claude Code (next reset) for spec/requirements, route execution to Pi + qwen3.6. Keep work in small chunks. Frontier model upstream, local model downstream.
Stopping conditions
Abandoned if local quality regresses below 'acceptable on review'. Revised again if local stack becomes good enough that cancellation math flips (likely H2 2026 territory based on current trajectory).
Result
Tested in production-shaped conditions on 2026-05-10. Eight hours, weekly Claude Code limit hit Saturday, reset Sunday 7am. Pi + qwen3.6 carried real dev work — slow but acceptable quality. Two parallel Pi terminals locked up; one terminal worked. Small chunks beat big asks. The replacement goal (cancel Max) is no longer the goal; the backup posture is.
Learning
Local models today are good enough as a backup with discipline (one terminal, small chunks, frontier-model spec upstream). They're not yet good enough to replace a frontier-model primary on a time/quality basis — the speed tax is real. Maybe end of year. The cost-cut math ($200/mo) was never the bottleneck — time was. Building a parity stack would have cost more time than the savings would buy back. Wrote up the lived experience as a short post: /posts/when-the-limits-hit.
