System Performance: Sharpe, Sortino, Calmar, Profit Factor

Two systems can both return 20% a year. The metrics below tell you which one took sane risk to get there.

Why metrics matter

Return alone is meaningless. A 30% return earned with 50% drawdowns is psychologically untradeable and mathematically inferior to a 25% return earned with 8% drawdowns. Risk-adjusted metrics let you compare systems honestly and pick the one most likely to survive.

1. Profit factor

The simplest and most intuitive metric.

Profit Factor = Gross Profit / Gross Loss

Value	Interpretation
< 1.0	Losing system
1.0–1.2	Marginal
1.2–1.5	Decent
1.5–2.0	Strong
> 2.0	Exceptional (or overfit)

Profit factor is intuitive but ignores timing — it doesn't care whether the profits came in steady increments or one lucky trade.

2. Sharpe ratio

Measures return per unit of total volatility.

Sharpe = (Mean Return − Risk-Free Rate) / Standard Deviation of Returns

For intraday/daily systems, the risk-free rate is often set to 0.

Annualized Sharpe	Interpretation
< 0.5	Poor
0.5–1.0	Acceptable
1.0–2.0	Good
2.0–3.0	Excellent
> 3.0	Suspicious (overfit or arb)

Limitations: Sharpe penalizes upside volatility equally with downside. A system with occasional large gains looks worse than it should.

3. Sortino ratio

Like Sharpe, but only penalizes downside volatility.

Sortino = (Mean Return − Risk-Free Rate) / Downside Deviation

Downside deviation considers only returns below a target (often 0 or MAR).

A trend-following system might have:

Sharpe = 1.1 (penalized for upside bursts)
Sortino = 1.8 (only downside matters)

For asymmetric strategies, Sortino is more honest than Sharpe. A Sortino above 2.0 is solid; above 3.0 is exceptional.

4. Calmar ratio

Measures return per unit of worst drawdown.

Calmar = Annualized Return / Maximum Drawdown

Calmar	Interpretation
< 0.5	Poor
0.5–1.0	Acceptable
1.0–3.0	Good
> 3.0	Exceptional

Calmar answers: "How much pain did I endure for each unit of return?" A system with 30% annual return and 10% max drawdown has Calmar = 3 — strong.

5. Maximum drawdown

Not a ratio but a critical absolute number. The largest peak-to-trough decline in the equity curve.

Express in both % of equity and R multiples
The psychological limit for most traders is 20–25%
Above 30% most traders abandon the system mid-drawdown

6. Recovery factor

Recovery Factor = Net Profit / Maximum Drawdown

Similar in spirit to Calmar, but uses net profit instead of annualized return. Above 5 is excellent.

7. Expectancy per trade

Expectancy = (Win% × Avg Win) − (Loss% × Avg Loss)

Measured in R-multiples. Anything above 0.2R per trade is solid for a discretionary system; 0.1R is acceptable for high-frequency.

Which metrics to use together

No single metric tells the full story. A minimal set:

Profit factor — profitability in raw terms
Sharpe — risk-adjusted return, penalizes volatility
Sortino — risk-adjusted return, only penalizes downside
Calmar — return per unit of worst drawdown
Maximum drawdown — psychological survivability
Expectancy — quality per trade

A system is "good" if it scores well on most of these simultaneously.

Worked comparison

Metric	System A	System B
Annual return	30%	30%
Max drawdown	25%	8%
Profit factor	1.4	1.6
Sharpe	1.2	1.9
Sortino	1.5	2.4
Calmar	1.2	3.75

Same headline return, very different systems. System B is the clear winner — more profit per unit of risk taken and a drawdown most traders can survive.

Common pitfalls in metric use

Annualizing short samples: a 1-week Sharpe of 5.0 means nothing — extrapolation lies
Ignoring sample size: 30 trades can't produce stable metrics
Reporting only the best metric: choose the metric before testing, not after
Comparing across frequencies: a daily system's Sharpe can't be directly compared to a monthly one without annualization
Using backtested drawdown as the live expectation: live drawdown is typically 1.5–2× backtested

Practical thresholds for retail systems

A retail-discretionary or simple systematic system is "good" when:

Profit factor ≥ 1.3
Sharpe ≥ 1.0
Sortino ≥ 1.5
Calmar ≥ 1.0
Max drawdown ≤ 20%
Expectancy ≥ 0.15R per trade

Above all these thresholds simultaneously is rare; below all is unacceptable.

Next: zoom in on the most important absolute number — maximum drawdown and recovery.