Searching for "Similar Markets" to Build Portfolios — Reproducing a Search-Based Asset Allocation Using Market Structure Similarity¶
For / Key Points
Audience: Investors interested in diversification and portfolio theory who intuitively believe that improving risk estimation should improve allocation. No math required — we cover the method's structure and verification results.
Key Points:
- A method was proposed that searches past "similar market environments" to synthesize covariance matrices and improve portfolio allocation
- We independently reproduced the results and confirmed that Risk Parity (RP) portfolios nearly match the paper's reported values
- However, when evaluated with transaction costs, performance equals or falls below the simplest estimation method (Ledoit-Wolf), quantitatively demonstrating that better estimation does not automatically mean better returns
- Search for "similar markets" Synthesize covariance matrices to improve portfolio allocation
- Reproduction matches the paper RP gap of just 0.004, MVP converges to 0.010
- After costs, no better than LW Better estimation ≠ better returns, confirmed quantitatively
What Does This Method Do?¶
In March 2020, the COVID crash reshuffled inter-industry correlations overnight. Investors who built portfolios from recent data rode into the drawdown assuming "normal diversification" still held. Similar regime breaks had happened before — but there was no systematic way to ask "which past period looked like today?"
Hoshino (2026), in a preprint1, proposes exactly that: searching through history for periods with "similar market structure" and blending their covariance matrices to build better portfolios.
The bottom line first: the method reproduces the paper's results almost exactly. But after transaction costs, it performs no better than the simplest estimation method (Ledoit-Wolf). Below, we trace why.
Why Use "Similar Periods"?¶
When building a portfolio, you need to estimate how much assets move together — the "covariance matrix." This matrix changes dramatically by market regime. Correlation patterns during the Lehman crisis look nothing like those in a calm bull market.
The naive approach estimates from the most recent 250 days, but with 49 industries that means 1,225 parameters — too many for the data, producing noisy estimates. Averaging over 20 years reduces noise but introduces bias from irrelevant regimes.
This method targets the middle ground. It finds the 10 most similar past market structures and synthesizes their covariance matrices via maximum likelihood estimation. By selecting only similar regimes, it should have less bias than the full-period average and less noise than the 250-day window — at least in theory.
So how does it define "similar"? This is the core of the method.
How Is "Similarity" Measured?¶
Four types of features are used to measure market structure similarity, falling into two categories: network-derived and matrix-derived.
Network-based (graph constructed from inter-industry correlations)
- Fiedler vector: Extracts graph structure features. Captures whether the market tends to split into two groups
- Closeness centrality: Quantifies how "close" each industry is to others on the network. Represents the distribution of market "connectivity"
Matrix-based (extracted directly from the covariance matrix)
- Leading eigenvector: The dominant direction of variation in the covariance matrix. Shows where overall market risk is concentrated
- Eigenvalue distribution: Whether risk concentrates in a few factors or is dispersed
Historical time points where these features are closest to "now" are identified, and their covariance matrices are used to build the portfolio. How much improvement did the paper achieve?
Paper Results¶
The evaluation uses Fama-French 49 Industry Portfolios2 (US, from 1926) over January 2006 to December 2025. Benchmarks are the sample covariance and Ledoit-Wolf shrinkage estimation3. The metric is the risk-return ratio (annualized return ÷ annualized risk) — higher means better risk-adjusted returns.
Minimum Variance Portfolio (MVP)¶
| Method | Ann. Return | Ann. Risk | Risk-Return Ratio |
|---|---|---|---|
| Sample covariance | 9.64% | 14.18% | 0.680 |
| Ledoit-Wolf | 9.69% | 14.18% | 0.684 |
| Proposed (Fiedler) | 10.59% | 14.22% | 0.744 |
Risk Parity Portfolio (RP)¶
| Method | Ann. Return | Ann. Risk | Risk-Return Ratio |
|---|---|---|---|
| Sample covariance | 10.43% | 18.82% | 0.554 |
| Ledoit-Wolf | 10.43% | 18.83% | 0.554 |
| Proposed (Closeness+AIRM) | 10.58% | 18.85% | 0.561 |
MVP's risk-return ratio rises from 0.680 to 0.744 — a relative improvement of about 9% ((0.744−0.680)÷0.680). RP shows only about 1% improvement. The RP margin is notably thin.
These margins can easily collapse with small implementation or evaluation differences. So we attempted an independent reproduction under matched conditions.
Our Reproduction¶
Setup¶
| Item | Paper | Our Test |
|---|---|---|
| Data | FF49 Industry Portfolios | Same (public data) |
| Evaluation period | 2006-01 to 2025-12 | Same |
| Algorithm | DuckDB VSS (HNSW) | Exact brute-force surrogate |
| Annualization | Unspecified (consistent with CAGR) | CAGR |
| Portfolios | MVP / RP | Same |
A "faithful reproduction" with matching data and period. Only the search engine differs (paper uses approximate nearest neighbor; we use exact search).
Benchmark Reproduction (Phase 0)¶
| Method | Measured | Paper | Gap | Verdict |
|---|---|---|---|---|
| Sample MVP | 0.650 | 0.680 | -0.031 | Directionally consistent |
| LW MVP | 0.662 | 0.684 | -0.021 | Directionally consistent |
| Sample RP | 0.553 | 0.554 | -0.001 | Near-exact match |
| LW RP | 0.559 | 0.554 | +0.006 | Near-exact match |
RP benchmarks nearly perfectly match the paper. MVP is slightly lower but directionally consistent.
Main Results (Phase 1)¶
| Method | Measured | Paper | Gap | Verdict |
|---|---|---|---|---|
| Fiedler MVP | 0.697 | 0.744 (ANN) / 0.707 (AIRM) | -0.010 (vs AIRM) | Near match |
| Closeness+AIRM RP | 0.557 | 0.561 | -0.004 | Near match |
RP differs by just 0.004 from the paper. MVP converges to within 0.010 of the re-ranking variant (AIRM).
Implementation Pitfalls Found During Reproduction¶
Two issues emerged that significantly affected results.
First: the annualization method. The paper doesn't specify its formula, but using CAGR (geometric mean annual return) instead of arithmetic mean (daily mean × 252) aligns with the paper's tables. With arithmetic mean, the RP risk-return ratio becomes 0.62 — far from the paper's 0.55. Annualized risk matches while return inflates, which initially made it look like a methodological problem.
Second: Fiedler vector sign ambiguity. The Fiedler vector is an eigenvector, so mathematically v and -v are both valid solutions. When comparing adjacent months' Fiedler vectors, signs flip 42% of the time. When using Euclidean distance to find "similar periods," a flipped sign makes the most similar period appear maximally distant. This fix alone raised the risk-return ratio from 0.667 to 0.697.
The Real Question — Is It Usable?¶
The reproduction went well. But in investing, the real question isn't "can we reproduce the paper" — it's "can we use this in practice?"
We compared the proposed method against Ledoit-Wolf (LW) as baseline, evaluating cost-adjusted performance.
MVP: Proposed Method vs LW¶
| Metric | Ledoit-Wolf | Proposed (Fiedler) | Difference |
|---|---|---|---|
| Risk-return ratio | 0.662 | 0.697 | +0.035 |
| Same (10bp costs) | 0.641 | 0.655 | +0.014 |
| Same (20bp costs) | 0.619 | 0.613 | -0.006 |
| Max drawdown | 40.71% | 37.84% | -2.87pp |
| Annual turnover | 1.41 | 2.86 | +1.45 |
At zero cost, +0.035 improvement. At 10bp (0.1% one-way), +0.014 still survives. Max drawdown improves by 2.87 points.
But at 20bp, it reverses. And turnover doubles. Monthly rebalancing produces large weight changes, making costs heavy.
RP: Proposed Method vs LW¶
| Metric | Ledoit-Wolf | Proposed (Closeness+AIRM) | Difference |
|---|---|---|---|
| Risk-return ratio | 0.559 | 0.557 | -0.002 |
| Same (10bp costs) | 0.557 | 0.555 | -0.002 |
| Max drawdown | 54.65% | 54.10% | -0.55pp |
| Annual turnover | 0.17 | 0.21 | +0.04 |
For RP, the proposed method underperforms LW even before costs. The improvement is not just zero — it's negative.
Verdict¶
| Portfolio | Decision | Reason |
|---|---|---|
| RP (Risk Parity) | Rejected | Equal or worse than LW. Complexity not justified |
| MVP (Min Variance) | On hold | Advantage up to 10bp, MDD improvement. But reverses at 20bp, double turnover |
RP succeeded as a reproduction but proved unnecessary as an operational tool — LW suffices. MVP shows conditional promise, but full adoption carries turnover risk. We position it as an experimental "optional feature."
Why Doesn't "Better Estimation" Mean "Better Performance"?¶
This result may seem counterintuitive. If covariance estimation improves, shouldn't portfolios improve too?
Two reasons.
First, LW is already remarkably good. Ledoit-Wolf shrinkage is a single function call, runs instantly, needs no parameter tuning, yet reliably improves on sample covariance. The search-based method must beat this baseline to be worthwhile — and for RP, it couldn't.
Second, estimation improvement gets offset by costs and turnover. Better covariance estimation moves portfolio weights in the "correct direction." But moving weights means rebalancing costs. When the estimation improvement is small, the incremental rebalancing cost consumes it entirely.
Comparing with a different quant strategy — the US-Japan sector lead-lag strategy — reveals the pattern:
| Lead-lag strategy | Search-based allocation | |
|---|---|---|
| Edge source | Time-zone information lag | Covariance estimation improvement |
| Edge magnitude | 26% annualized (pre-cost) | Risk-return ratio +0.035 (pre-cost) |
| Execution frequency | Daily | Monthly |
| Cost sensitivity | Extremely high | Moderate |
| Cause of death | Thin margins × daily costs | Thin improvement × turnover |
Both follow the pattern of "the map is correct, but operating costs make it unprofitable." But they die differently. Lead-lag gets killed by daily transaction costs. Search-based allocation can't clear the hurdle of LW as a strong baseline.
Lessons from This Verification¶
Don't underestimate baseline strength. Ledoit-Wolf is a one-liner but an extremely strong estimator. When evaluating new methods, "better than sample covariance" is trivial — "better than LW" is the real bar.
Reproduction and operational judgment are separate questions. Our reproduction closely matched the paper's values. But "reproducible" and "usable" are different. RP succeeded in reproduction but was rejected operationally. MVP's reproduction was partial, yet it remains a conditional candidate. Reproduction accuracy and adoption decisions don't necessarily correlate.
Precisely identify the source of edge. This method's edge comes from "improved risk estimation," not "market prediction." Risk estimation improvements structurally generate smaller profits than return predictions. Whether that thin profit survives as a cost-adjusted margin over the baseline determines the final adoption decision.
In investing, "theoretically correct improvement" and "practically surviving improvement" are different things. Measuring that gap quantitatively is as important in quantitative investing as the theory itself. But one more thing — the experience of stepping on implementation pitfalls through reproduction sharpens judgment when evaluating the next method. This hands-on experiential knowledge, unavailable from reading papers alone, may be the most reproducible asset in quantitative investing.
Related Articles¶
Hoshino (2026). Search-based asset allocation using market structure similarity. Preprint. ↩
Kenneth R. French, Fama-French 49 Industry Portfolios. Dartmouth College. ↩
Ledoit, O., & Wolf, M. (2004). A well-conditioned estimator for large-dimensional covariance matrices. Journal of Multivariate Analysis, 88(2), 365–411. ↩