Searching for "Similar Markets" to Build Portfolios — Reproducing a Search-Based Asset Allocation Using Market Structure Similarity¶

For / Key Points

Audience: Investors interested in diversification and portfolio theory who intuitively believe that improving risk estimation should improve allocation. No math required — we cover the method's structure and verification results.

Key Points:

A method was proposed that searches past "similar market environments" to synthesize covariance matrices and improve portfolio allocation
We independently reproduced the results and confirmed that Risk Parity (RP) portfolios nearly match the paper's reported values
However, when evaluated with transaction costs, performance equals or falls below the simplest estimation method (Ledoit-Wolf), quantitatively demonstrating that better estimation does not automatically mean better returns

Search for "similar markets" Synthesize covariance matrices to improve portfolio allocation
Reproduction matches the paper RP gap of just 0.004, MVP converges to 0.010
After costs, no better than LW Better estimation ≠ better returns, confirmed quantitatively

What Does This Method Do?¶

In March 2020, the COVID crash reshuffled inter-industry correlations overnight. Investors who built portfolios from recent data rode into the drawdown assuming "normal diversification" still held. Similar regime breaks had happened before — but there was no systematic way to ask "which past period looked like today?"

Hoshino (2026), in a preprint¹, proposes exactly that: searching through history for periods with "similar market structure" and blending their covariance matrices to build better portfolios.

The bottom line first: the method reproduces the paper's results almost exactly. But after transaction costs, it performs no better than the simplest estimation method (Ledoit-Wolf). Below, we trace why.

Why Use "Similar Periods"?¶

When building a portfolio, you need to estimate how much assets move together — the "covariance matrix." This matrix changes dramatically by market regime. Correlation patterns during the Lehman crisis look nothing like those in a calm bull market.

The naive approach estimates from the most recent 250 days, but with 49 industries that means 1,225 parameters — too many for the data, producing noisy estimates. Averaging over 20 years reduces noise but introduces bias from irrelevant regimes.

This method targets the middle ground. It finds the 10 most similar past market structures and synthesizes their covariance matrices via maximum likelihood estimation. By selecting only similar regimes, it should have less bias than the full-period average and less noise than the 250-day window — at least in theory.

So how does it define "similar"? This is the core of the method.

How Is "Similarity" Measured?¶

Four types of features are used to measure market structure similarity, falling into two categories: network-derived and matrix-derived.

Network-based (graph constructed from inter-industry correlations)

Fiedler vector: Extracts graph structure features. Captures whether the market tends to split into two groups
Closeness centrality: Quantifies how "close" each industry is to others on the network. Represents the distribution of market "connectivity"

Matrix-based (extracted directly from the covariance matrix)

Leading eigenvector: The dominant direction of variation in the covariance matrix. Shows where overall market risk is concentrated
Eigenvalue distribution: Whether risk concentrates in a few factors or is dispersed

Historical time points where these features are closest to "now" are identified, and their covariance matrices are used to build the portfolio. How much improvement did the paper achieve?

Paper Results¶

The evaluation uses Fama-French 49 Industry Portfolios² (US, from 1926) over January 2006 to December 2025. Benchmarks are the sample covariance and Ledoit-Wolf shrinkage estimation³. The metric is the risk-return ratio (annualized return ÷ annualized risk) — higher means better risk-adjusted returns.

Minimum Variance Portfolio (MVP)¶

Method	Ann. Return	Ann. Risk	Risk-Return Ratio
Sample covariance	9.64%	14.18%	0.680
Ledoit-Wolf	9.69%	14.18%	0.684
Proposed (Fiedler)	10.59%	14.22%	0.744

Risk Parity Portfolio (RP)¶

Method	Ann. Return	Ann. Risk	Risk-Return Ratio
Sample covariance	10.43%	18.82%	0.554
Ledoit-Wolf	10.43%	18.83%	0.554
Proposed (Closeness+AIRM)	10.58%	18.85%	0.561

MVP's risk-return ratio rises from 0.680 to 0.744 — a relative improvement of about 9% ((0.744−0.680)÷0.680). RP shows only about 1% improvement. The RP margin is notably thin.

These margins can easily collapse with small implementation or evaluation differences. So we attempted an independent reproduction under matched conditions.

Our Reproduction¶

Setup¶

Item	Paper	Our Test
Data	FF49 Industry Portfolios	Same (public data)
Evaluation period	2006-01 to 2025-12	Same
Algorithm	DuckDB VSS (HNSW)	Exact brute-force surrogate
Annualization	Unspecified (consistent with CAGR)	CAGR
Portfolios	MVP / RP	Same

A "faithful reproduction" with matching data and period. Only the search engine differs (paper uses approximate nearest neighbor; we use exact search).

Benchmark Reproduction (Phase 0)¶

Method	Measured	Paper	Gap	Verdict
Sample MVP	0.650	0.680	-0.031	Directionally consistent
LW MVP	0.662	0.684	-0.021	Directionally consistent
Sample RP	0.553	0.554	-0.001	Near-exact match
LW RP	0.559	0.554	+0.006	Near-exact match

RP benchmarks nearly perfectly match the paper. MVP is slightly lower but directionally consistent.

Main Results (Phase 1)¶

Method	Measured	Paper	Gap	Verdict
Fiedler MVP	0.697	0.744 (ANN) / 0.707 (AIRM)	-0.010 (vs AIRM)	Near match
Closeness+AIRM RP	0.557	0.561	-0.004	Near match

RP differs by just 0.004 from the paper. MVP converges to within 0.010 of the re-ranking variant (AIRM).

Implementation Pitfalls Found During Reproduction¶

Two issues emerged that significantly affected results.

First: the annualization method. The paper doesn't specify its formula, but using CAGR (geometric mean annual return) instead of arithmetic mean (daily mean × 252) aligns with the paper's tables. With arithmetic mean, the RP risk-return ratio becomes 0.62 — far from the paper's 0.55. Annualized risk matches while return inflates, which initially made it look like a methodological problem.

Second: Fiedler vector sign ambiguity. The Fiedler vector is an eigenvector, so mathematically v and -v are both valid solutions. When comparing adjacent months' Fiedler vectors, signs flip 42% of the time. When using Euclidean distance to find "similar periods," a flipped sign makes the most similar period appear maximally distant. This fix alone raised the risk-return ratio from 0.667 to 0.697.

The Real Question — Is It Usable?¶

The reproduction went well. But in investing, the real question isn't "can we reproduce the paper" — it's "can we use this in practice?"

We compared the proposed method against Ledoit-Wolf (LW) as baseline, evaluating cost-adjusted performance.

MVP: Proposed Method vs LW¶

Metric	Ledoit-Wolf	Proposed (Fiedler)	Difference
Risk-return ratio	0.662	0.697	+0.035
Same (10bp costs)	0.641	0.655	+0.014
Same (20bp costs)	0.619	0.613	-0.006
Max drawdown	40.71%	37.84%	-2.87pp
Annual turnover	1.41	2.86	+1.45

At zero cost, +0.035 improvement. At 10bp (0.1% one-way), +0.014 still survives. Max drawdown improves by 2.87 points.

But at 20bp, it reverses. And turnover doubles. Monthly rebalancing produces large weight changes, making costs heavy.

RP: Proposed Method vs LW¶

Metric	Ledoit-Wolf	Proposed (Closeness+AIRM)	Difference
Risk-return ratio	0.559	0.557	-0.002
Same (10bp costs)	0.557	0.555	-0.002
Max drawdown	54.65%	54.10%	-0.55pp
Annual turnover	0.17	0.21	+0.04

For RP, the proposed method underperforms LW even before costs. The improvement is not just zero — it's negative.

Verdict¶

Portfolio	Decision	Reason
RP (Risk Parity)	Rejected	Equal or worse than LW. Complexity not justified
MVP (Min Variance)	On hold	Advantage up to 10bp, MDD improvement. But reverses at 20bp, double turnover

RP succeeded as a reproduction but proved unnecessary as an operational tool — LW suffices. MVP shows conditional promise, but full adoption carries turnover risk. We position it as an experimental "optional feature."

Why Doesn't "Better Estimation" Mean "Better Performance"?¶

This result may seem counterintuitive. If covariance estimation improves, shouldn't portfolios improve too?

Two reasons.

First, LW is already remarkably good. Ledoit-Wolf shrinkage is a single function call, runs instantly, needs no parameter tuning, yet reliably improves on sample covariance. The search-based method must beat this baseline to be worthwhile — and for RP, it couldn't.

Second, estimation improvement gets offset by costs and turnover. Better covariance estimation moves portfolio weights in the "correct direction." But moving weights means rebalancing costs. When the estimation improvement is small, the incremental rebalancing cost consumes it entirely.

Comparing with a different quant strategy — the US-Japan sector lead-lag strategy — reveals the pattern:

	Lead-lag strategy	Search-based allocation
Edge source	Time-zone information lag	Covariance estimation improvement
Edge magnitude	26% annualized (pre-cost)	Risk-return ratio +0.035 (pre-cost)
Execution frequency	Daily	Monthly
Cost sensitivity	Extremely high	Moderate
Cause of death	Thin margins × daily costs	Thin improvement × turnover

Both follow the pattern of "the map is correct, but operating costs make it unprofitable." But they die differently. Lead-lag gets killed by daily transaction costs. Search-based allocation can't clear the hurdle of LW as a strong baseline.

Lessons from This Verification¶

Don't underestimate baseline strength. Ledoit-Wolf is a one-liner but an extremely strong estimator. When evaluating new methods, "better than sample covariance" is trivial — "better than LW" is the real bar.

Reproduction and operational judgment are separate questions. Our reproduction closely matched the paper's values. But "reproducible" and "usable" are different. RP succeeded in reproduction but was rejected operationally. MVP's reproduction was partial, yet it remains a conditional candidate. Reproduction accuracy and adoption decisions don't necessarily correlate.

Precisely identify the source of edge. This method's edge comes from "improved risk estimation," not "market prediction." Risk estimation improvements structurally generate smaller profits than return predictions. Whether that thin profit survives as a cost-adjusted margin over the baseline determines the final adoption decision.

In investing, "theoretically correct improvement" and "practically surviving improvement" are different things. Measuring that gap quantitatively is as important in quantitative investing as the theory itself. But one more thing — the experience of stepping on implementation pitfalls through reproduction sharpens judgment when evaluating the next method. This hands-on experiential knowledge, unavailable from reading papers alone, may be the most reproducible asset in quantitative investing.

What Happens to Japanese Stocks the Morning After US Markets Move — Reproducing a US-Japan Sector Lead-Lag Strategy

Hoshino (2026). Search-based asset allocation using market structure similarity. Preprint. ↩
Kenneth R. French, Fama-French 49 Industry Portfolios. Dartmouth College. ↩
Ledoit, O., & Wolf, M. (2004). A well-conditioned estimator for large-dimensional covariance matrices. Journal of Multivariate Analysis, 88(2), 365–411. ↩