- Startseite /
- Blog /
- Hypothesis Testing and Sample Significance
Hypothesis Testing and Sample Significance
Is your strategy's edge real, or just luck? Hypothesis testing gives you a framework to separate genuine edge from random noise in trading results.
Interaktive Tools funktionieren in der übersetzten Ansicht möglicherweise nicht.
Hypothesis Testing and Sample Significance
A backtest that made money is not proof of edge. It's a hypothesis. The question is whether the result is significant.
Every backtest looks profitable. The job of statistics is to tell you whether that profit could plausibly have come from pure chance. Hypothesis testing is the formal tool for this judgment.
Setting up the test
- Null hypothesis (H0): the strategy has zero edge — observed profit is random noise
- Alternative (H1): the strategy has a real positive edge
- Pick a significance level α (commonly 0.05) — the false-positive rate you'll tolerate
The test statistic typically uses the t-distribution when the sample is small and σ is estimated:
t = (x̄ − μ0) ÷ (s ÷ √n)
Where x̄ is the sample mean return per trade, μ0 is the hypothesized mean (often 0), s the sample standard deviation, and n the number of trades.
The p-value
The p-value is the probability of seeing a result at least this extreme if H0 were true.
- p < 0.05 → reject H0 — evidence of a real edge
- p > 0.05 → fail to reject H0 — could be noise
But beware: a low p-value is not a guarantee. Run the same flawed strategy on 20 different markets and one will appear "significant" purely by chance.
Sample size matters most
With a small sample, even a strong edge can fail to reach significance. With a huge sample, a useless edge can appear "significant" because the test has too much power. For trading:
- Fewer than 30 trades → almost meaningless; treat with deep skepticism
- 100–300 trades → reasonable sample for most setups
- Thousands of trades → watch out for tiny effects disguised as significance
Common trading mistakes
- Multiple comparisons: testing 50 variations and reporting the best one as if you'd tested one. Correct with Bonferroni or false-discovery-rate adjustments
- Survivorship bias in the test: only testing symbols that still trade today
- Data snooping: the more parameters you tune, the more likely significance is an illusion
- Ignoring dependence: overlapping returns violate the independence assumption and inflate the t-statistic
Practical takeaway
A profitable backtest is a hypothesis, not a conclusion. Compute the t-statistic and p-value. Require p < 0.05 after correcting for the number of variations you tested. Then — and only then — trade the strategy with small size and watch whether live results track the backtest.
Summary
Hypothesis testing gives you a disciplined way to ask, "Could this be luck?" It's not perfect, but it's far better than trusting a backtest on faith. Treat every strategy as a hypothesis to be tested, and let statistics — not hope — decide when an edge is real enough to trade.
Live Chart
Open full chart →Related market data, powered by TradingView.