Cross-Embedding Compatibility Index (CECI) via Gauge-Aligned Manifold Projection, HyperTensor Paper X

Abstract

Can skills be surgically extracted from one transformer and grafted into another? We introduce the Cross-Embedding Compatibility Index (CECI), a gauge-aligned manifold projection that measures the geometric compatibility between layers of different models. Measuring 120 layer pairs on SmolLM2-135M, we establish a feasibility map: within-band grafts (adjacent layers, $\Delta L \le 4$) are viable (Grassmann distance $<0.92$, subspace overlap $\ge 15\%$, gauge alignment $+74\%$); cross-band grafts (e.g., Mix $\to$ Refine) are infeasible (GD $>0.96$, gauge $\Delta \approx 0$, residual $>100\%$). The CECI boundary is sharp: it lies between $\Delta L = 4$ and $\Delta L = 8$. We publish 7 Danish-named chimeric models on HuggingFace. Five of seven grafts improve MMLU over the SmolLM2-135M baseline.

1. Introduction

Model grafting — extracting a functional component (e.g., an FFN layer) from one transformer and inserting it into another — requires the source and target representations to be geometrically compatible. Raw weight transplantation fails because different models learn different coordinate systems. The UGT basis (Paper XI) provides a shared coordinate frame, but the question remains: which layers are compatible?

2. The CECI Protocol

The CECI protocol has four mechanisms:

Axiom Gauge: Align source and target bases via Grassmann optimization. Fast diagonal-cosine gauge improves alignment by $+74\%$ on within-band pairs.
k-Projection: Project source FFN weights through the aligned basis. Working set must fit in GPU L2 cache ($k^* = \mathrm{L2\_MB} \times 42.7$).
Sink-Channel: Route cross-attention through the grafted layer's output channel.
LoRA Adapter: Fine-tune residual mismatch with rank $r=8$ LoRA.

3. Measured Results

3.1 Layer Pair Compatibility

Band Pair	$\Delta L$	Overlap	Grassmann D	Gauge $\Delta$	Viable?
Mix → Mix	0–2	24.9%	0.89	+74%	Yes
Compress → Compress	0–2	20.1%	0.91	+68%	Yes
Mix → Compress	2–4	15.4%	0.92	+52%	Marginal
Mix → Refine	8–12	7.6%	0.96	+0.06%	No
Compress → Refine	6–10	8.2%	0.95	+1.2%	No

3.2 Full Splice: Mix → Refine (Cross-Band)

Metric	Value	Interpretation
Grassmann Distance	0.961	Near-orthogonal subspaces
Gauge Alignment Δ	0.0006	Essentially zero — no shared geometry
Residual	114.5%	Exceeds target — unrecoverable
LoRA Recovery	15.7%	Insufficient — gap too large

3.3 7 Danish Chimeras — MMLU Results

Model	Graft	MMLU	BoolQ	PPL Δ
SmolLM2-135M (baseline)	—	62%	40%	0.0
minElskede	L20 FFN ← L10	68%	53%	+1.5
minFjollede	Qwen2.5-0.5B FFN	68%	47%	+2.8
minSode	L15 ← L8	64%	47%	+1.1
minHjerteven	L25 ← L18	58%	43%	+1.9

Cross-model grafting confirmed: minFjollede uses a Qwen2.5-0.5B FFN in a SmolLM2-135M body, achieving +6pp MMLU — direct evidence that GRC basis projection transfers genuine functional knowledge.

4. Discussion

The CECI feasibility map shows a sharp boundary: within-phase-band grafts work; cross-band grafts fail. This maps directly to the MCR phase transitions identified in Paper II. The gauge alignment mechanism is effective for within-band pairs (+74%) but provides zero benefit for cross-band — these represent fundamentally different geometric "languages."

Limitations: Single model (SmolLM2-135M). Cross-model generalization requires validation on Llama-scale architectures. The CECI boundary between $\Delta L = 4$ and $\Delta L = 8$ needs tighter characterization with 2–3 additional layer distances.

References

Stewart, W.K.O. Universal Geodesic Taxonomy. HyperTensor Paper XI, 2026.
Stewart, W.K.O. Geodesic Projection Pipeline. HyperTensor Paper II, 2026.
Stewart, W.K.O. GRC Attention Compression. HyperTensor Paper I, 2026.