Expectancy and System Evaluation Metrics

Expectancy alone does not evaluate a system. Two systems with identical expectancy can have wildly different drawdowns, recovery profiles, and risk-adjusted returns. A complete evaluation combines expectancy with return-to-risk metrics that expose what expectancy hides.

Expectancy

Expectancy is the average R-multiple per trade:

Expectancy = (Win% × AvgWin) − (Loss% × AvgLoss)

A positive expectancy means the system makes money per trade over the long run, but says nothing about how rough the path is. A system with 0.2R expectancy and 40% win rate can draw down severely between winners.

Require expectancy above 0.15R for a tradable system; below that, costs and slippage erase the edge live.

Profit Factor

Profit Factor = Gross Profit / Gross Loss

Below 1.0: losing system.
1.2-1.5: marginal, costs will hurt.
1.5-2.0: solid.
Above 2.5: suspiciously good, suspect overfitting.

Profit factor above 2.0 with a small sample is a red flag, not a green one. Demand 200+ trades before trusting any number above 2.0.

Sharpe and Sortino

Sharpe = (Mean Return − Risk-Free Rate) / Std Dev of Returns

Sharpe penalizes upside volatility, which is not actually risk. Sortino fixes this using only downside deviation:

Sortino = (Mean Return − Risk-Free Rate) / Downside Std Dev

For asymmetric systems (trend-following with small losses, large winners), Sortino is the more honest measure. Thresholds:

Sharpe below 0.5: weak.
Sharpe 0.5-1.0: acceptable.
Sharpe 1.0-1.5: strong.
Sharpe above 2.0: question the backtest.

MAR and Calmar

Both measure return per unit of maximum drawdown:

MAR = Annualized Return / Max Drawdown Calmar = Annualized Return / Max Drawdown (computed over 36 months)

Below 0.5: poor risk-adjusted return.
0.5-1.0: reasonable.
Above 1.0: strong.
Above 2.0: rare and worth scrutinizing.

MAR and Calmar are the metrics that matter for capital allocation, because drawdown is what blows up accounts, not volatility.

Maximum Drawdown and Recovery

Report max drawdown alongside recovery time, the trades or days to reach a new equity high. A 20% drawdown recovered in 30 trades is tolerable; one taking 2 years traps capital. Track the drawdown distribution from Monte Carlo, not just the single backtest drawdown.

Combining Metrics for Decisions

No single metric decides; use a gate sequence:

Expectancy > 0.15R and profit factor > 1.3 or reject.
Sharpe > 0.6 (or Sortino > 0.8 for asymmetric systems).
MAR > 0.5 and Monte Carlo 95th-percentile drawdown < tolerable threshold.
Trade count > 200 and out-of-sample performance within 50% of in-sample.

A system passing all four gates is tradable. One failing any gate is suspect, regardless of how good another metric looks.

The Trap to Avoid

Optimizing for one metric in isolation is how systems get overfit; a system tuned to maximize Sharpe often parameter-fits to a low-volatility regime. Always evaluate the full metric set together on out-of-sample data, and treat any single exceptional number as a warning, not a selling point.