Backtesting Methodology: Data and Process

A backtest is only as trustworthy as the data and process behind it. Beautiful equity curves hide garbage inputs.

Why methodology matters

Two traders can backtest the same strategy and get opposite results. The difference is methodology — data quality, simulation assumptions, sample handling, and execution modeling. Without a disciplined process, a backtest is just a story you tell yourself.

Step 1: Define the period

Choose a backtest window that covers:

At least one full market cycle (bull, bear, chop)
Multiple volatility regimes (high VIX and low VIX)
At least 100 trades for statistical reliability (300+ preferred)

For daily systems, 10+ years is reasonable. For intraday, 2–5 years of tick or minute data is typical. Shorter periods make results noise-driven.

Step 2: Acquire clean data

Source	Quality	Cost
Broker-provided history	Often filtered/spread-adjusted	Free
Tick data vendors (TickData, HistData)	High	Paid
Exchange direct feeds	Highest	Expensive
TradingView/Kinetick	Decent for most uses	Subscription

Look for:

Tick-by-tick or 1-minute OHLC bars minimum
Bid and ask for forex (not just mid)
Volume for futures and stocks (essential for many setups)
Adjusted splits and dividends for equities
No gaps in the series — missing bars skew indicators

Free data is the most common source of phantom backtest edges. If you can afford it, pay for clean tick history.

Step 3: Account for costs

A backtest without costs is fiction. Include:

Spread: use realistic average spread, not best-case
Commission: round-turn fee per lot
Slippage: 1–2 ticks for liquid markets, more for thin ones
Swap/financing: for overnight positions
Slippage on stops: stops fill worse than trigger in fast markets

A "1R" winner can become 0.7R after realistic costs. Many systems that look profitable in clean backtests turn negative once costs are applied.

Step 4: Choose the simulation model

Model	How fills happen	Best for
Next bar open	Fill at the next bar's open	Conservative, simple
Stop at trigger	Stop fills at trigger price	Optimistic — assumes no slippage
Stop at trigger + slippage	Stop fills at trigger + N ticks	Realistic
Tick replay	Each tick simulated	Most accurate, slowest

Always pick the most conservative model that your tools support. Optimistic fills are the second-largest source of phantom edge.

Step 5: Define entry, stop, and target before testing

If you tweak these while running the backtest, you are curve-fitting, not testing. Write rules down, code them, and run them unchanged.

Step 6: Run the test and capture metrics

Minimum metrics to record:

Total trades
Win rate
Average win and average loss (in R)
Expectancy (per trade)
Profit factor
Maximum drawdown (in % and R)
Sharpe ratio
Recovery factor (net profit / max drawdown)
Longest losing streak

Step 7: Subsample analysis

Split the backtest into:

First half vs second half: does edge persist or decay?
Bull vs bear vs chop periods: regime sensitivity
High-vol vs low-vol periods: robustness check
Per instrument: if multi-instrument, breakdown by symbol

A robust system shows edge in most subsamples. A fragile one carries its gains in one or two outlier periods.

Step 8: Walk-forward validation

After a single in-sample test, divide the data into segments and run walk-forward analysis (separate article). This catches curve-fitting that single-pass tests miss.

Step 9: Out-of-sample test

Reserve 20–30% of your data never used during development. Run the final rules on this untouched segment. If results collapse, the system was overfit.

Common methodology errors

Using close prices to enter "at the close" — impossible in real time
Allowing the entry and stop on the same bar without checking which came first
Ignoring the gap between Friday close and Monday open
Forgetting overnight swap costs on forex positions
Counting partial fills as full
Re-running the backtest until results look good (p-hacking)

Tooling checklist

Supports the data frequency you need
Models slippage and commission explicitly
Allows OCO and trailing stops
Exports trade list to CSV for journal review
Supports walk-forward analysis
Lets you lock the final rules before out-of-sample testing

Practical advice

Treat the first backtest as a hypothesis, not a verdict. If results look too good, assume a methodology error and audit data, costs, fills, and rules. Real edge is modest; spectacular returns in backtests usually mean the model is fooling you.

Next: identify the biases that turn mediocre systems into "winners" on paper.