Statistical Significance of Seasonal Patterns

"The S&P 500 has risen in 7 of the last 10 Marches." That sounds like an edge — until you realize a fair coin also lands heads 7 of 10 times about 12% of the time. Without a significance test, you cannot tell a real seasonal edge from a lucky streak.

Seasonality is one of the most abused fields in trading. Every calendar window shows some average return, and the human brain finds stories in any pattern. Significance testing separates signal from coincidence.

Why eyeballing averages is dangerous

A seasonal table shows the mean return for a month over N years. The mean hides three things:

Sample size: 10 observations is not enough to draw conclusions
Dispersion: a positive average with huge variance is still a coin flip
Data mining: scan 12 months × many assets, and something will look good by chance

A 60% win rate over 10 trials is statistically indistinguishable from chance. A 60% win rate over 1,000 trials is a real edge.

The null hypothesis

Testing starts by assuming the null hypothesis: the seasonal pattern does not exist, and any observed return is random. We then ask — how unlikely is the observed result if the null were true?

Test	Use case	What it tells you
t-test	Mean return ≠ 0 for a month	Is the average real or noise?
Binomial test	Win rate ≠ 50%	Is "up 7 of 10 years" significant?
Chi-square	Distribution across months	Are returns uniform or skewed?
Bootstrap	No distribution assumption	Confidence interval on the mean
Walk-forward	Out-of-sample validation	Does the edge survive unseen data?

The p-value is the probability of seeing a result at least this extreme if the null is true. The trading standard is p < 0.05, though stricter traders demand p < 0.01 to compensate for multiple testing.

Multiple testing: the silent killer

If you test 12 months individually, the chance that at least one shows p < 0.05 purely by luck is roughly 46% — not 5%. This is the multiple comparisons problem, and why "I found an edge in April" is meaningless unless April was pre-specified.

Corrections:

Bonferroni: divide the threshold by the number of tests (12 months → need p < 0.0042)
False Discovery Rate: controls the proportion of false positives accepted
Pre-registration: pick the window before looking at the data

Effect size matters more than p-values

A pattern can be statistically significant but economically useless — a 0.2% monthly return with p = 0.01 may not cover transaction costs. Always pair significance with effect size (annualized return), Sharpe ratio, and drawdown during adverse years.

Out-of-sample: the only honest test

The strongest evidence is walk-forward: fit the rule on years 1–20, test on years 21–25, repeat. A pattern that survives out-of-sample across non-overlapping windows is far more credible than one fit on the full history.

Practical steps

Demand at least 20–30 years of data before trusting any seasonal average
Run a t-test or binomial test, not just visual inspection
Apply Bonferroni or FDR correction if you scanned multiple windows
Report effect size and Sharpe alongside the p-value
Validate out-of-sample — a rule that only works in-sample is curve-fitting

Bottom line

Statistical significance is the line between a seasonal edge and a seasonal story. Treat any seasonal claim that isn't backed by a corrected p-value, a meaningful effect size, and out-of-sample validation as entertainment, not a trading signal.

Next: revisit Cycle Overlap and Resonance Analysis and apply these same significance tests before trusting any cycle alignment.