tearsheet · regime-stratified backtest harness

I run a Hyperliquid copy-trading bot on $1,554 of my own money.

Eleven-module pipeline that ingests trader fills, reconstructs trips with FIFO + VWAP, scores each across four regimes (uptrend, chop, downtrend, high-vol), applies slippage-adjusted statistics, and gates promotion via a tiered Boolean. The point is not the bot. The point is the harness that decides which wallets the bot is allowed to copy.

live:hyperliquid mainnet aum:$1,554 tests:436 passing

harness pipeline

                  ┌─────────────────┐
                  │   N candidates  │
                  │   (stockpile)   │
                  └────────┬────────┘
                           │
                  ┌────────▼────────┐
                  │   fill-cache    │  ← 7d Supabase cache
                  └────────┬────────┘
                           │
                  ┌────────▼────────┐
                  │   position-     │  ← FIFO + VWAP entry/exit
                  │   reconstructor │     skip sz=0 funding fills
                  └────────┬────────┘
                           │
                  ┌────────▼────────┐
                  │     regime-     │  ← BTC r7±3% / r30±5%
                  │     classifier  │     uptrend / chop / down
                  │                 │     + orthogonal high_vol
                  └────────┬────────┘
                           │
                  ┌────────▼────────┐
                  │  slippage-model │  ← 10bps majors / 25bps alts
                  └────────┬────────┘     round-trip net of fees
                           │
                  ┌────────▼────────┐
                  │    ev-engine    │  ← per-regime sharpe,
                  │                 │     t-stat (NET R), HHI,
                  │                 │     max-dd
                  └────────┬────────┘
                           │
                  ┌────────▼────────┐
                  │  promote-gate   │  ← tiered A/B/C
                  │  (tier A/B/C)   │     size mult 1.0/0.5/0.25
                  └────────┬────────┘
                           │
              ┌────────────┼─────────────┐
              │            │             │
     ┌────────▼──────┐ ┌──▼──────────┐ ┌▼─────────────┐
     │  null-dist    │ │  decay-     │ │ live tracker │
     │  (calibration)│ │  watch      │ │ (auto-flip   │
     │  → 0/50 ✓     │ │  (6h cron)  │ │  is_tracked) │
     └───────────────┘ └─────────────┘ └──────────────┘

Eleven modules. Each independently testable. Promote decisions auto-flip the live bot's tracked-wallet flag; demotions auto-untrack.

promote-gate.ts (the artifact)

PROMOTE_TIER(wallet)

  // catastrophe caps — apply to ALL tiers
  overall.concentration_hhi      <=  0.40
  overall.max_dd_r               >=  -8.0
  high_vol.n_trades              <   10  ||  high_vol.ev_mean_r_net  >=  0

  // regime passes IFF n >=30  ∧  t-stat >1.96  ∧  net ev >0
  regimes_passed = count of {uptrend, chop, downtrend} where:
      n_trades       >=  30
      ev_t_stat      >   1.96      (95% CI mean > 0, on net R)
      ev_mean_r_net  >   0          (after 10bps maj / 25bps alt slippage)

  // tier band — relax sharpe with size penalty
  regimes_passed == 3  &&  overall.sharpe > 1.0   →  TIER A,  size 1.00
  regimes_passed == 2  &&  overall.sharpe > 0.7   →  TIER B,  size 0.50
  regimes_passed == 1  &&  overall.sharpe > 0.5   →  TIER C,  size 0.25
  otherwise                                       →  REJECT,  size 0

Three catastrophe caps + a tiered band. The caps short-circuit before tier classification — a wallet can have spectacular Sharpe and still get rejected if HHI is over 0.40 (single-coin concentration) or if max drawdown breaches -8R.

Sharpe and t-stat are computed on net R — slippage-adjusted before the gate sees them. Earlier versions computed gross stats, which would have spuriously promoted wallets whose edge disappeared after a realistic execution model. (Caught in /review the day the gate shipped.)

Run on a 50-wallet random null cohort drawn from active HL traders, 180-day lookback:

>>> 0 / 50 random wallets passed.

Calibration check. If random wallets had passed, the gate would be too loose. The 1000-wallet stockpile run is in receipts.

receipts

mainnet AUM             $1,554        live since 2026-04
leverage cap            2x            source-locked at 3x hardcap
risk gates              9             kill_switch + per_trade_cap +
                                      preflight + max_concurrent +
                                      consecutive_loss + daily_loss +
                                      weekly_loss + max_drawdown +
                                      book_depth
unit tests              436           passing
random-null promotes    0 / 50        threshold ≤ 2%, observed 0%
stockpile cohort        1000          re-scoring with net-stat gate
live trades closed      5             will update at N≥30

engineering notes

The harness is one design choice in the trading system, not the trading system itself. The bot does not generate price signals. It selects traders to copy — applying the harness as the selection filter. The thesis: "AI is good at selecting traders, not predicting markets." Every architectural decision flows from that.

Slippage is not a fudge factor. It's evaluated separately for majors (10 bps round-trip) and alts (25 bps), absorbs the round-trip fee on Hyperliquid, and is plumbed into both the mean R and the variance — which is what makes the t-statistic actually defensible.

Decay-watch runs every 6 hours and demotes any promoted wallet whose 30-day EV drops below half of its 90-day EV. Demotion auto-untracks the wallet from the live bot in the same transaction. There is no manual intervention path between "wallet passes gate" and "wallet's trades hit my account" — by design.

now

Building the live-paper validator + walk-forward train/test split.

Reading López de Prado on purged k-fold cross-validation.

Open to research / contract work at crypto-native firms — DM @claygdev.