Building a Backtesting Engine That Doesn't Lie to You

Every quantitative trader has had this experience: backtest shows 200% annual returns. Live trading shows -15%.

The problem is almost never the strategy. It's the backtest. Most backtesting engines lie through optimistic assumptions.

The 5 Lies Most Backtests Tell

Lie 1: Perfect Fills

Most engines assume your order fills at the exact price you see. In reality:

Market orders fill at the ask (buying) or bid (selling), not the mid-price
Large orders move the market (slippage)
During volatility, fills can be 5-10 ticks worse than expected

My engine models this:

def simulate_fill(order, market_data):
    spread = market_data.ask - market_data.bid
    slippage = spread * 0.5  # Conservative: half the spread

    if order.side == 'BUY':
        fill_price = market_data.ask + slippage
    else:
        fill_price = market_data.bid - slippage

    return fill_price

Lie 2: Unlimited Liquidity

Your backtest buys 10,000 shares instantly. In reality, that order takes minutes to fill and the price moves against you.

I cap position sizes relative to average volume:

max_position = daily_avg_volume * 0.01  # Never more than 1% of daily volume

Lie 3: No Transaction Costs

Commissions, exchange fees, SEC fees, and financing costs add up fast. On ES futures, round-trip costs are ~$4.50 per contract. On 100 trades/day, that's $450 in friction.

Lie 4: Look-Ahead Bias

The most dangerous lie. If your indicators use tomorrow's data to make today's decision, your backtest will look incredible and your live trading will be random.

I enforce strict temporal ordering: every signal at time T uses only data from T-1 and earlier.

Lie 5: Survivorship Bias

If you're testing stock strategies, you're probably testing on stocks that survived to today. The ones that went bankrupt aren't in your dataset. This inflates returns.

The Engine Architecture

class BacktestEngine:
    def __init__(self, strategy, data, config):
        self.strategy = strategy
        self.data = data
        self.broker = SimulatedBroker(config)
        self.portfolio = Portfolio(config.initial_capital)

    def run(self):
        for timestamp, bar in self.data.iterrows():
            # 1. Update portfolio with fills from previous bar
            self.broker.process_fills(bar)

            # 2. Strategy generates signals using PREVIOUS bar data
            signal = self.strategy.on_bar(
                bar=self.data.loc[:timestamp].iloc[:-1],  # Exclude current bar
                portfolio=self.portfolio
            )

            # 3. Convert signals to orders with position sizing
            if signal:
                order = self.risk_manager.size_order(
                    signal, self.portfolio, bar
                )
                self.broker.submit(order)

            # 4. Record state for analysis
            self.portfolio.record_snapshot(timestamp)

Key design decisions:

Event-driven, not vectorized: Each bar is processed sequentially. Slower, but guarantees temporal correctness.
Strategy only sees past data: The iloc[:-1] ensures no look-ahead.
Broker simulates realistic fills: Slippage, commissions, partial fills.

Metrics That Matter

I report these metrics using quantstats and empyrical:

Metric	What It Tells You	Red Flag Threshold
Sharpe Ratio	Risk-adjusted return	Below 1.0
Max Drawdown	Worst peak-to-trough	Above 25%
Win Rate	% of winning trades	Below 40%
Profit Factor	Gross profit / Gross loss	Below 1.5
Expectancy	Average $ per trade	Below 0 (obviously)
Recovery Factor	Net profit / Max drawdown	Below 3.0

If your Sharpe is above 3.0 in a backtest, you're probably overfitting. Real-world Sharpes for systematic strategies are typically 0.8-2.0.

Walk-Forward Optimization

I never optimize parameters on the full dataset. Instead:

Train on 2019-2021
Validate on 2022
Test on 2023
Re-train on 2020-2022
Validate on 2023
Test on 2024

This walk-forward approach ensures the strategy generalizes to unseen data. If it only works on one specific period, it's curve-fit.

The Bottom Line

A good backtesting engine is one that makes your strategies look worse than they are. If your backtest results are conservative and your live trading beats them, you've built a trustworthy system.

Building a Backtesting Engine That Doesn't Lie to You

Building a Backtesting Engine That Doesn't Lie to You

The 5 Lies Most Backtests Tell

Lie 1: Perfect Fills

Lie 2: Unlimited Liquidity

Lie 3: No Transaction Costs

Lie 4: Look-Ahead Bias

Lie 5: Survivorship Bias

The Engine Architecture

Metrics That Matter

Walk-Forward Optimization

The Bottom Line

Want to see this in action?