Statistical Significance of Seasonal Patterns
A seasonal average that looks good on a chart can still be pure noise — only a proper significance test tells you whether the pattern has a real edge or is just a random cluster of lucky years.
交互工具在翻译视图中可能无法使用。
Statistical Significance of Seasonal Patterns
"The S&P 500 has risen in 7 of the last 10 Marches." That sounds like an edge — until you realize a fair coin also lands heads 7 of 10 times about 12% of the time. Without a significance test, you cannot tell a real seasonal edge from a lucky streak.
Seasonality is one of the most abused fields in trading. Every calendar window shows some average return, and the human brain finds stories in any pattern. Significance testing separates signal from coincidence.
Why eyeballing averages is dangerous
A seasonal table shows the mean return for a month over N years. The mean hides three things:
- Sample size: 10 observations is not enough to draw conclusions
- Dispersion: a positive average with huge variance is still a coin flip
- Data mining: scan 12 months × many assets, and something will look good by chance
A 60% win rate over 10 trials is statistically indistinguishable from chance. A 60% win rate over 1,000 trials is a real edge.
The null hypothesis
Testing starts by assuming the null hypothesis: the seasonal pattern does not exist, and any observed return is random. We then ask — how unlikely is the observed result if the null were true?
| Test | Use case | What it tells you |
|---|---|---|
| t-test | Mean return ≠ 0 for a month | Is the average real or noise? |
| Binomial test | Win rate ≠ 50% | Is "up 7 of 10 years" significant? |
| Chi-square | Distribution across months | Are returns uniform or skewed? |
| Bootstrap | No distribution assumption | Confidence interval on the mean |
| Walk-forward | Out-of-sample validation | Does the edge survive unseen data? |
The p-value is the probability of seeing a result at least this extreme if the null is true. The trading standard is p < 0.05, though stricter traders demand p < 0.01 to compensate for multiple testing.
Multiple testing: the silent killer
If you test 12 months individually, the chance that at least one shows p < 0.05 purely by luck is roughly 46% — not 5%. This is the multiple comparisons problem, and why "I found an edge in April" is meaningless unless April was pre-specified.
Corrections:
- Bonferroni: divide the threshold by the number of tests (12 months → need p < 0.0042)
- False Discovery Rate: controls the proportion of false positives accepted
- Pre-registration: pick the window before looking at the data
Effect size matters more than p-values
A pattern can be statistically significant but economically useless — a 0.2% monthly return with p = 0.01 may not cover transaction costs. Always pair significance with effect size (annualized return), Sharpe ratio, and drawdown during adverse years.
Out-of-sample: the only honest test
The strongest evidence is walk-forward: fit the rule on years 1–20, test on years 21–25, repeat. A pattern that survives out-of-sample across non-overlapping windows is far more credible than one fit on the full history.
Practical steps
- Demand at least 20–30 years of data before trusting any seasonal average
- Run a t-test or binomial test, not just visual inspection
- Apply Bonferroni or FDR correction if you scanned multiple windows
- Report effect size and Sharpe alongside the p-value
- Validate out-of-sample — a rule that only works in-sample is curve-fitting
Bottom line
Statistical significance is the line between a seasonal edge and a seasonal story. Treat any seasonal claim that isn't backed by a corrected p-value, a meaningful effect size, and out-of-sample validation as entertainment, not a trading signal.
Next: revisit Cycle Overlap and Resonance Analysis and apply these same significance tests before trusting any cycle alignment.
Live Chart
Open full chart →Related market data, powered by TradingView.