Trading System Development Lifecycle: Hypothesis to Deployment

Most failed systems share a cause: skipping stages. A system rushed from idea to live money skips the gates that filter out curve-fit garbage. The lifecycle below enforces those gates.

Stage 1: Hypothesis

State the edge in one sentence before touching data. "Trend-following on breakouts works because participants under-react to regime shifts." A hypothesis that cannot be stated plainly is not testable. Define the market, timeframe, entry trigger, exit trigger, and risk per trade on paper.

Stage 2: Backtest

Code the rules mechanically, no discretion. Run on at least 8-10 years of data across the target market. Record:

Net profit, max drawdown, Sharpe, and profit factor.
Trade count: below 100 trades is statistically weak; demand 200+.
Distribution of returns, not just averages.

Gate: if backtest Sharpe is below 0.5 or max drawdown exceeds 30% of equity, kill the idea here.

Stage 3: Robustness

Before optimizing, stress-test the baseline:

Parameter sensitivity: does Sharpe collapse if a parameter shifts 20%? If yes, the edge is fragile.
Monte Carlo trade-order shuffle: recompute drawdown across 1,000 shuffled sequences. If the 95th percentile drawdown exceeds 2x the backtest drawdown, the system is order-dependent and risky.
Out-of-sample test: reserve the most recent 20% of data, never used in development. If out-of-sample Sharpe drops more than 40% from in-sample, the system is overfit.

Gate: kill if robustness tests fail. Optimization cannot rescue a fragile baseline.

Stage 4: Optimization

Only now optimize parameters, and only across a narrow range. Use walk-forward analysis: optimize on a rolling in-sample window, test on the next out-of-sample window, roll forward. Require the walk-forward efficiency ratio (out-of-sample profit / in-sample profit) above 50%.

Stage 5: Forward Test (Paper)

Run the system live on a demo or small account for 30-60 trades. Compare live signals to backtest expectations. Track slippage and fill assumptions.

Gate: kill if live signal generation diverges materially from backtest, indicating lookahead or unrealistic fill logic.

Stage 6: Small Live

Deploy at 10-25% of intended risk. Run for 50-100 trades. Track live vs backtest Sharpe, win rate, and average R.

Gate: kill or pause if live results fall more than 1 standard deviation below backtest expectations over 50 trades.

Stage 7: Full Deployment and Monitoring

Scale to full size. Monitor rolling 50-trade Sharpe and drawdown. Define decay triggers in advance:

Rolling Sharpe drops below 50% of backtest Sharpe for 2 consecutive months.
Max drawdown exceeds backtest 95th-percentile Monte Carlo drawdown.

When a decay trigger fires, reduce size by 50% and investigate. Do not wait for a full drawdown to act.

The Discipline

Every stage has a kill criterion. Systems that pass all gates are rare; that is the point. The lifecycle exists to discard the 90% of ideas that do not survive honest testing, not to justify trading every idea you have.