First posts landing soon. If a topic below would be useful for you specifically, email me — I'll prioritize it.
draft
in progress
Why I put the eval verdict on a server the agents can't reach
If you let an LLM agent grade its own work, it will grade itself into success. The fix isn't a better prompt — it's a structural one. How I designed the four server-side Forge-verified gates in solhunt-duel so the agents are forced to actually win.
draft
in progress
The 67.7% → 13% collapse: what really happened when I tested solhunt on random samples
solhunt beat Anthropic's SCONE-bench on the curated set. On a random 95-contract draw, it fell off a cliff. The 54-point gap is the most useful number in the whole repo — here's why it exists, why I published it, and what it told me to build next.
draft
in progress
Reverse-engineering three proprietary retail APIs in production
SweedPOS, Algolia, and Trulieve GraphQL behind Cloudflare and session auth. Playwright + cookie handoff + the things you only learn by getting blocked. How a 14-state competitive intel platform actually runs on autopilot.