blog · ~6 min read

Hypothesis Testing and Sample Significance

Is your strategy's edge real, or just luck? Hypothesis testing gives you a framework to separate genuine edge from random noise in trading results.

T By tradernewbie · Curated for beginners
#statistics#quantitative
Эта статья на английском. Открыть на вашем языке? Google Translate →

Интерактивные инструменты могут не работать в переведённом виде.

Hypothesis Testing and Sample Significance

A backtest that made money is not proof of edge. It's a hypothesis. The question is whether the result is significant.

Every backtest looks profitable. The job of statistics is to tell you whether that profit could plausibly have come from pure chance. Hypothesis testing is the formal tool for this judgment.

Setting up the test

  1. Null hypothesis (H0): the strategy has zero edge — observed profit is random noise
  2. Alternative (H1): the strategy has a real positive edge
  3. Pick a significance level α (commonly 0.05) — the false-positive rate you'll tolerate

The test statistic typically uses the t-distribution when the sample is small and σ is estimated:

t = (x̄ − μ0) ÷ (s ÷ √n)

Where x̄ is the sample mean return per trade, μ0 is the hypothesized mean (often 0), s the sample standard deviation, and n the number of trades.

The p-value

The p-value is the probability of seeing a result at least this extreme if H0 were true.

  • p < 0.05 → reject H0 — evidence of a real edge
  • p > 0.05 → fail to reject H0 — could be noise

But beware: a low p-value is not a guarantee. Run the same flawed strategy on 20 different markets and one will appear "significant" purely by chance.

Sample size matters most

With a small sample, even a strong edge can fail to reach significance. With a huge sample, a useless edge can appear "significant" because the test has too much power. For trading:

  • Fewer than 30 trades → almost meaningless; treat with deep skepticism
  • 100–300 trades → reasonable sample for most setups
  • Thousands of trades → watch out for tiny effects disguised as significance

Common trading mistakes

  1. Multiple comparisons: testing 50 variations and reporting the best one as if you'd tested one. Correct with Bonferroni or false-discovery-rate adjustments
  2. Survivorship bias in the test: only testing symbols that still trade today
  3. Data snooping: the more parameters you tune, the more likely significance is an illusion
  4. Ignoring dependence: overlapping returns violate the independence assumption and inflate the t-statistic

Practical takeaway

A profitable backtest is a hypothesis, not a conclusion. Compute the t-statistic and p-value. Require p < 0.05 after correcting for the number of variations you tested. Then — and only then — trade the strategy with small size and watch whether live results track the backtest.

Summary

Hypothesis testing gives you a disciplined way to ask, "Could this be luck?" It's not perfect, but it's far better than trusting a backtest on faith. Treat every strategy as a hypothesis to be tested, and let statistics — not hope — decide when an edge is real enough to trade.

Related market data, powered by TradingView.

Educational content · Not financial advice · Trade at your own risk