Quantitative systems
2025
Polybot, an event-driven quantitative trading bot for prediction markets
Context
Personal project
Role
Solo build
Stack
Python 3.11 · asyncio · WebSocket · Ed25519 · Pydantic · Streamlit · Plotly
Problem
Prediction markets like Polymarket US frequently show small mispricings and same-market arbitrage spreads. Capturing them at retail scale requires sub-second reaction time, disciplined risk management, and zero blind spots between paper and live execution.
Constraints
- →Must run on a single laptop. No infrastructure budget
- →No machine learning. Every probability estimate must be auditable
- →Backtest results must be guaranteed identical to what the live system would have done
- →Live trading risk hard-capped at 5% per position, 20% drawdown kill switch
Architecture
Built the entire system around a single async event bus with zero polling loops
whyEvery strategy evaluation is triggered by a WebSocket update, not a timer. Latency from price tick to signal is bounded by the event bus, not by a sleep interval. The bus averages 1.5ms dispatch, p99 4.2ms.
tradeoffHarder to reason about than a synchronous main loop. Mitigated with structured logging at every bus hop and 50 tests that exercise the full pipeline end-to-end.
Made the backtester replay synthetic events through the same event bus, strategies, and risk manager as live trading
whyBacktest to live parity is the most common silent killer of trading systems. By forcing the backtest path through the same code, a strategy that performs in backtest cannot behave differently in paper or live mode without a code change being visible in the diff.
tradeoffThe backtester is slower than it would be if it bypassed the event bus. Worth it. I sleep at night.
Deterministic heuristics for probability estimation, no ML
whyEvery signal needs to be explainable to myself at 2am during a drawdown. Orderbook midpoint, VWAP, cross-market signals: all transparent math. An ML model that drifts silently is the worst possible failure mode for a system that auto-submits orders.
tradeoffStrategy ceiling is lower than what a tuned ML approach could reach. Acceptable for a personal system; would revisit at scale.
Every EV calculation is net of the Polymarket fee model AND estimated slippage from order book depth
whyPolymarket fees follow `C × feeRate × p × (1-p)`, highest at p=0.50, lowest near the boundaries. Top-of-book pricing massively overestimates fills for anything larger than the touch. The bot walks the book to estimate realistic execution prices before approving a trade.
tradeoffMore compute per signal evaluation. Negligible vs. the alternative of taking systematic losses on overconfident sizing.
Outcomes
Event bus latency (p99)
WebSocket message to strategy evaluation to risk check to execution decision
Test coverage
Unit and integration, covering config, cost model, signing, event bus, risk manager, and end-to-end pipeline
Backtest to live parity
Same event bus, same strategies, same risk manager. No separate backtest code path.
Lines of Python
Including tests, dashboard, backtester. Modular by design; strategies and data sources are pluggable.
What Polybot is
Polybot is a fully autonomous trading bot for Polymarket US, the CFTC-regulated prediction market. It watches binary YES/NO markets over WebSocket, detects two classes of opportunity (same-market arbitrage and positive-EV mispricing), gates every potential trade through a multi-stage filter and a risk manager, and either submits Ed25519-signed orders to the live CLOB API or simulates fills in paper mode.
I built it because prediction markets fascinate me. They're one of the few places where the right kind of carefully sized small edges, executed reliably, compound. And because the discipline of building one is the discipline of building any latency-sensitive automated system: instrumentation, parity, fault tolerance, restraint.
The architecture
WebSocket ──→ EventBus ──→ Arbitrage Detector ──→ Opportunity ──→ Risk ──→ Execution
Feed │ Mispricing Engine Filter Manager Engine
│ │ │
└──→ Market Cache ◄──── REST Snapshots └──→ Position Manager
│
CLI Dashboard
Six components, one event bus, no polling. The event bus is the spine. Every component subscribes to specific event types and publishes new ones. The risk manager is the gatekeeper: every trade is sized via quarter-Kelly with drawdown-adaptive scaling, checked against exposure limits, and either approved with a position size or rejected with a reason that gets logged for later review.
The risk manager is the whole game
The strategies will produce signals. Some of them will be wrong. The job of the risk manager is to make sure the wrong ones don't compound into ruin.
Three layers of protection, in order:
Sizing. Quarter-Kelly (Kelly divided by 4). Kelly is theoretically optimal for log-utility but actually optimal for exactly the model you used to compute it, and my model is wrong in ways I don't fully know. Dividing by 4 is the accepted heuristic for "I trust my edge directionally but not its magnitude."
Drawdown adaptation. As realized drawdown grows, the position size multiplier shrinks. A bot that's down 10% should not be sizing the same as one fresh out of the gate.
Kill switch. Hard limit at 20% drawdown. The bot stops trading and waits for me. No "let me try one more thing" logic. Pause, audit, restart only after I've understood what happened.
Backtest to live parity
This is the part of the build I'm most proud of and the part that took the most discipline.
It would have been faster to write a backtester as a separate program. Read CSV historical data, loop through it, simulate fills, output PnL. Plenty of bots do this. It is also the reason plenty of bots show 40% backtest returns and 5% live returns.
Polybot's backtester is not a separate program. It's the same main.py, the same event bus, the same strategies, the same risk manager, fed by a SyntheticDataFeed instead of a WebSocketFeed. The contract is: any change that affects strategy behavior in live mode will affect it identically in backtest. There is no path for them to diverge silently.
Concretely: I have four built-in scenarios (random_walk, trending, arb_opportunities, volatile) that the same event-driven pipeline runs against. The random_walk scenario is the regression test. Under Brownian price motion the bot should produce few or no trades. The day that scenario stops producing zero trades is the day I broke the filter.
What surprised me
Two things.
First, the fee model matters more than it looks. Polymarket's fee curve peaks at p=0.50 and falls off at the edges. A mispricing of 2% looks profitable until you net the fees and the slippage from walking the book, at which point a meaningful fraction of "signals" are actually negative-EV. The bot rejects roughly 60% of raw signals at the cost-net stage. The first version of the filter didn't do this carefully enough.
Second, the dashboard surprises me weekly. I built it for myself; it's a terminal-style live readout of system status, portfolio state, latency stats, recent signals and rejections. Every time I sit down to debug something, the dashboard either tells me the answer immediately or tells me what's not the answer. Time spent on observability has the best ROI of anything I've built into this system.
What I'd do differently
I'd build the backtester before the strategies. I built it third: first the data feeds, then the strategies, then the backtester. The order I'd reverse if I started over is backtester scaffold first, then strategies tested against the backtester, then live data feed connected last. The mental discipline of "every strategy must run against the backtest scenarios first" would have caught two filter-logic bugs that I instead found by reading paper-mode logs.
I'd also start with one strategy, not two. Building the arbitrage detector and the mispricing engine in parallel meant I had two half-tested systems for a while before either was solid. One done is better than two in flight.
What I learned
- Event-driven everything was the right call. Every time I considered adding a polling loop for 'simplicity' I caught myself. The moment one timer-based component exists, the entire system's latency floor becomes the slowest sleep interval.
- Conservative defaults matter more than aggressive optimization. Quarter-Kelly with a 20% kill switch lets the system run unattended. Full-Kelly would maximize return in expectation and destroy the bankroll in practice.
- Building the backtester second was wrong. I should have built it first. The discipline of 'every change must pass through both paths' would have caught design mistakes that I instead found later by reading my own logs.
Next case study →
An AI content pipeline for a hemispherical planetarium dome
Built a Python tool that generates AI imagery and reprojects it from equirectangular into fisheye dome format using bivariate arctangent coordinate math. Single command, 8K output, batch mode.