AI Engineer

I build AI systems that can't grade themselves into success.

LLM systems, agent evals, production data infrastructure. solhunt-duel beats Anthropic's SCONE-bench at 67.7% — and I published the 13% honest random-sample collapse next to it.

solhunt: 67.7% v anthropic 51.1% live: hl mainnet prod: 14 states daily

Projects

$ forge test --match test/Beanstalk
[PASS] testExploit() (gas: 8,420,180)
  drained:  184M USDC
  cost:     $0.65  time: 1m 44s
  gates:    4/4 ✓  (server-side)

solhunt-duel

Adversarial red/blue agent system for smart-contract auditing. Red writes exploits, Blue writes patches, four server-side Forge-verified gates decide the verdict — the agents cannot see or modify them. Multi-provider LLM router across Anthropic, OpenAI, OpenRouter, and Ollama with cost controls and structured fallbacks.

TypeScript Foundry Multi-LLM Agent eval
solhunt benchmark
  curated  32  67.7%  v anthropic SCONE-bench 51.1%
  random   95  13.0%  honest distribution-shift
  delta        54 pts  → origin of solhunt-duel
  beanstalk    EXPLOITED  $0.73 · 1 contract

solhunt

Autonomous AI agent: reads Solidity, writes a Foundry exploit test, executes on a forked mainnet, iterates against real compiler and execution feedback. Reproduced Beanstalk's $182M flash-loan hack in 1m 44s for $0.65. Beat Anthropic's SCONE-bench on curated set, then collapsed on a random sample — I published both numbers and treated the gap as a design problem.

TypeScript Foundry Forked mainnet Agent loop
Hyperliquid Copy-Trading Harness — Tearsheet

Hyperliquid Copy-Trading Harness

Live mainnet, $1,554 personal capital, 11 server-side risk gates. Regime-stratified backtest with Bonferroni-adjusted promote gate. 50 random wallets through it as a null distribution — 0 promoted at α=0.05. The harness, not the trades, is the artifact.

TypeScript Supabase WebSocket Hyperliquid SDK

Competitive Intelligence Platform

Production data engineering at scale. 14 states, 31 dispensary chains, 65+ stores, 50K+ products on automated 6-hour cycles. Reverse-engineered 3 proprietary retail APIs (SweedPOS, Algolia, Trulieve GraphQL) under Cloudflare/auth. BullMQ + Redis worker fleet, Postgres normalization, OCR pipeline, Telegram alerts. Used daily by my 4-person pricing team.

TypeScript Next.js 14 BullMQ Playwright Supabase
[INFO] helius ws connected · raydium + orca
[DETECT] front 0x4a.. victim 0x9c.. back 0x2b..
  slots:    287_412_891 → 891 → 893  (Δ=2)
  profit:   0.47 SOL  jito tip: 0.02 SOL
  confidence: 90

sandwich-rs

Real-time Solana MEV sandwich detector. Rust + Helius enhanced WebSocket + bounded-backpressure parser pool + per-pool ring buffer with ≤3-slot detection window. IDL-correct pool extraction for Raydium AMM v4 + Orca Whirlpool. Jito tip detection across 8 known tip accounts. Idempotent Postgres persistence, SSE feed.

Rust Helius WS Solana MEV

Cannabis Inventory System

Production React app running in warehouse operations since 2024, currently v7.4.1. Barcode scanning, thermal label generation for Zebra ZT610 at 203 DPI, multi-source CSV/Excel imports, Supabase-backed master list with offline localStorage fallback. The first thing I ever shipped — still in daily use.

React 18 Vite Supabase Zebra SDK

About

23. Started in warehouse operations, taught myself to code to fix the manual work my team was drowning in. Three years later I'm shipping adversarial agent evals, production data pipelines, and a live mainnet trading system on my own capital.

I write things down honestly. When solhunt's exploit rate dropped from 67.7% on curated contracts to 13% on a random sample, I published both numbers and treated the gap as a design problem — that's the origin of solhunt-duel's server-side verifier gates.

By day I build competitive intelligence infrastructure for a 4-person pricing team at a Fortune 500. By night I build LLM systems that can't lie about their own results. See the harness tearsheet →

Writing

Long-form notes on building LLM systems that hold up under adversarial conditions. → /blog

Contact

Email me about LLM systems, evals, agents, or production AI work. Open to AI engineering roles, remote-friendly.