Building a Backtesting Engine That Doesn't Lie to You
Every quantitative trader has had this experience: backtest shows 200% annual returns. Live trading shows -15%.
The problem is almost never the strategy. It's the backtest. Most backtesting engines lie through optimistic assumptions.
The 5 Lies Most Backtests Tell
Lie 1: Perfect Fills
Most engines assume your order fills at the exact price you see. In reality:
- Market orders fill at the ask (buying) or bid (selling), not the mid-price
- Large orders move the market (slippage)
- During volatility, fills can be 5-10 ticks worse than expected
My engine models this:
def simulate_fill(order, market_data):
spread = market_data.ask - market_data.bid
slippage = spread * 0.5 # Conservative: half the spread
if order.side == 'BUY':
fill_price = market_data.ask + slippage
else:
fill_price = market_data.bid - slippage
return fill_price
Lie 2: Unlimited Liquidity
Your backtest buys 10,000 shares instantly. In reality, that order takes minutes to fill and the price moves against you.
I cap position sizes relative to average volume:
max_position = daily_avg_volume * 0.01 # Never more than 1% of daily volume
Lie 3: No Transaction Costs
Commissions, exchange fees, SEC fees, and financing costs add up fast. On ES futures, round-trip costs are ~$4.50 per contract. On 100 trades/day, that's $450 in friction.
Lie 4: Look-Ahead Bias
The most dangerous lie. If your indicators use tomorrow's data to make today's decision, your backtest will look incredible and your live trading will be random.
I enforce strict temporal ordering: every signal at time T uses only data from T-1 and earlier.
Lie 5: Survivorship Bias
If you're testing stock strategies, you're probably testing on stocks that survived to today. The ones that went bankrupt aren't in your dataset. This inflates returns.
The Engine Architecture
class BacktestEngine:
def __init__(self, strategy, data, config):
self.strategy = strategy
self.data = data
self.broker = SimulatedBroker(config)
self.portfolio = Portfolio(config.initial_capital)
def run(self):
for timestamp, bar in self.data.iterrows():
# 1. Update portfolio with fills from previous bar
self.broker.process_fills(bar)
# 2. Strategy generates signals using PREVIOUS bar data
signal = self.strategy.on_bar(
bar=self.data.loc[:timestamp].iloc[:-1], # Exclude current bar
portfolio=self.portfolio
)
# 3. Convert signals to orders with position sizing
if signal:
order = self.risk_manager.size_order(
signal, self.portfolio, bar
)
self.broker.submit(order)
# 4. Record state for analysis
self.portfolio.record_snapshot(timestamp)
Key design decisions:
- Event-driven, not vectorized: Each bar is processed sequentially. Slower, but guarantees temporal correctness.
- Strategy only sees past data: The
iloc[:-1]ensures no look-ahead. - Broker simulates realistic fills: Slippage, commissions, partial fills.
Metrics That Matter
I report these metrics using quantstats and empyrical:
| Metric | What It Tells You | Red Flag Threshold |
|---|---|---|
| Sharpe Ratio | Risk-adjusted return | Below 1.0 |
| Max Drawdown | Worst peak-to-trough | Above 25% |
| Win Rate | % of winning trades | Below 40% |
| Profit Factor | Gross profit / Gross loss | Below 1.5 |
| Expectancy | Average $ per trade | Below 0 (obviously) |
| Recovery Factor | Net profit / Max drawdown | Below 3.0 |
If your Sharpe is above 3.0 in a backtest, you're probably overfitting. Real-world Sharpes for systematic strategies are typically 0.8-2.0.
Walk-Forward Optimization
I never optimize parameters on the full dataset. Instead:
- Train on 2019-2021
- Validate on 2022
- Test on 2023
- Re-train on 2020-2022
- Validate on 2023
- Test on 2024
This walk-forward approach ensures the strategy generalizes to unseen data. If it only works on one specific period, it's curve-fit.
The Bottom Line
A good backtesting engine is one that makes your strategies look worse than they are. If your backtest results are conservative and your live trading beats them, you've built a trustworthy system.