Abstract
Can skills be surgically extracted from one transformer and grafted into another? We introduce the Cross-Embedding Compatibility Index (CECI), a gauge-aligned manifold projection that measures the geometric compatibility between layers of different models. Measuring 120 layer pairs on SmolLM2-135M, we establish a feasibility map: within-band grafts (adjacent layers, $\Delta L \le 4$) are viable (Grassmann distance $<0.92$, subspace overlap $\ge 15\%$, gauge alignment $+74\%$); cross-band grafts (e.g., Mix $\to$ Refine) are infeasible (GD $>0.96$, gauge $\Delta \approx 0$, residual $>100\%$). The CECI boundary is sharp: it lies between $\Delta L = 4$ and $\Delta L = 8$. We publish 7 Danish-named chimeric models on HuggingFace. Five of seven grafts improve MMLU over the SmolLM2-135M baseline.
1. Introduction
Model grafting — extracting a functional component (e.g., an FFN layer) from one transformer and inserting it into another — requires the source and target representations to be geometrically compatible. Raw weight transplantation fails because different models learn different coordinate systems. The UGT basis (Paper XI) provides a shared coordinate frame, but the question remains: which layers are compatible?
2. The CECI Protocol
The CECI protocol has four mechanisms:
- Axiom Gauge: Align source and target bases via Grassmann optimization. Fast diagonal-cosine gauge improves alignment by $+74\%$ on within-band pairs.
- k-Projection: Project source FFN weights through the aligned basis. Working set must fit in GPU L2 cache ($k^* = \mathrm{L2\_MB} \times 42.7$).
- Sink-Channel: Route cross-attention through the grafted layer's output channel.
- LoRA Adapter: Fine-tune residual mismatch with rank $r=8$ LoRA.
3. Measured Results
3.1 Layer Pair Compatibility
| Band Pair | $\Delta L$ | Overlap | Grassmann D | Gauge $\Delta$ | Viable? |
|---|---|---|---|---|---|
| Mix → Mix | 0–2 | 24.9% | 0.89 | +74% | Yes |
| Compress → Compress | 0–2 | 20.1% | 0.91 | +68% | Yes |
| Mix → Compress | 2–4 | 15.4% | 0.92 | +52% | Marginal |
| Mix → Refine | 8–12 | 7.6% | 0.96 | +0.06% | No |
| Compress → Refine | 6–10 | 8.2% | 0.95 | +1.2% | No |
3.2 Full Splice: Mix → Refine (Cross-Band)
| Metric | Value | Interpretation |
|---|---|---|
| Grassmann Distance | 0.961 | Near-orthogonal subspaces |
| Gauge Alignment Δ | 0.0006 | Essentially zero — no shared geometry |
| Residual | 114.5% | Exceeds target — unrecoverable |
| LoRA Recovery | 15.7% | Insufficient — gap too large |
3.3 7 Danish Chimeras — MMLU Results
| Model | Graft | MMLU | BoolQ | PPL Δ |
|---|---|---|---|---|
| SmolLM2-135M (baseline) | — | 62% | 40% | 0.0 |
| minElskede | L20 FFN ← L10 | 68% | 53% | +1.5 |
| minFjollede | Qwen2.5-0.5B FFN | 68% | 47% | +2.8 |
| minSode | L15 ← L8 | 64% | 47% | +1.1 |
| minHjerteven | L25 ← L18 | 58% | 43% | +1.9 |
Cross-model grafting confirmed: minFjollede uses a Qwen2.5-0.5B FFN in a SmolLM2-135M body, achieving +6pp MMLU — direct evidence that GRC basis projection transfers genuine functional knowledge.
4. Discussion
The CECI feasibility map shows a sharp boundary: within-phase-band grafts work; cross-band grafts fail. This maps directly to the MCR phase transitions identified in Paper II. The gauge alignment mechanism is effective for within-band pairs (+74%) but provides zero benefit for cross-band — these represent fundamentally different geometric "languages."
Limitations: Single model (SmolLM2-135M). Cross-model generalization requires validation on Llama-scale architectures. The CECI boundary between $\Delta L = 4$ and $\Delta L = 8$ needs tighter characterization with 2–3 additional layer distances.
References
- Stewart, W.K.O. Universal Geodesic Taxonomy. HyperTensor Paper XI, 2026.
- Stewart, W.K.O. Geodesic Projection Pipeline. HyperTensor Paper II, 2026.
- Stewart, W.K.O. GRC Attention Compression. HyperTensor Paper I, 2026.