Monte Carlo Simulation Implementation for Trading

Monte Carlo turns a single backtest into a distribution of outcomes. Two methods, bootstrap and parametric, suit different questions; using the wrong one produces confident nonsense.

When to Use Which

Bootstrap (resample trades): when you have a real trade list and want the distribution of equity curves it could have produced. No distributional assumption required.
Parametric (sample from a fitted distribution): to generate synthetic trades beyond the observed sample, e.g., 1-in-1000 events. Requires fitting a distribution to returns.

Use bootstrap by default; parametric only for the far tails when you accept the modeling assumption.

Bootstrap Implementation Pattern

trades = array of R-multiples from backtest
n_sims = 10000
n_trades = len(trades)
terminal_equities = []
max_drawdowns = []
for i in range(n_sims):
    sampled = random.choice(trades, size=n_trades, replace=True)
    equity = cumsum(sampled)
    terminal_equities.append(equity[-1])
    max_drawdowns.append(max_drawdown(equity))
p5_terminal = percentile(terminal_equities, 5)
p95_drawdown = percentile(max_drawdowns, 95)

Report the 5th percentile terminal equity and the 95th percentile drawdown. The mean is misleading; the tails are the decision inputs.

Parametric Implementation

Fit a distribution to daily returns. For fat-tailed markets, a Student's t with low degrees of freedom (3-5) fits better than normal. Then sample:

df, loc, scale = t.fit(daily_returns)
simulated = t.rvs(df, loc, scale, size=(n_sims, n_days))
equity_curves = cumprod(1 + simulated, axis=1)

The parametric method lets you simulate longer histories than you observed, useful for rare-event estimation, but the result is only as good as the fit. Always compare the fitted tail to the observed tail; if the fit underestimates observed extremes, the parametric simulation underestimates risk.

Pitfalls That Invalidate Results

Sampling returns instead of trades when trades are autocorrelated. Trend systems produce streaks; resampling individual trades destroys streak structure and underestimates drawdown. Use block bootstrap with block length matching autocorrelation decay (5-20 trades).
Ignoring costs. Resampling gross R-multiples ignores slippage and commissions. Resample net R-multiples or subtract costs per trade.
Quoting the mean outcome. A mean terminal equity of +30% with a 5th percentile of -50% is a ruin risk, not a profitable system.
Too few simulations. Below 1,000, the percentile estimates are noisy. Use 10,000 for stable tail estimates.

Applying the Output

Position sizing: set risk per trade so the 95th-percentile drawdown stays below your tolerable drawdown (e.g., 20% of equity).
System comparison: compare systems by 5th-percentile terminal equity, not means; the better worst-case is more robust.
Kill criterion: if the 95th-percentile drawdown exceeds your survival threshold, the system is untradable at current size.

Validation

Validate against reality: run the bootstrap on the first half of your trade list and compare the predicted drawdown distribution to the actual second-half drawdown. If the actual falls outside the simulated 90% confidence interval, your model is wrong, fix it before trusting any output.

Monte Carlo is a sanity check, not a forecast. It reveals how fragile your backtest is to ordering and sampling, and that fragility is the most honest risk number you have.