The HyperTensor Project.
Long-form teaching article. Assumes no prior knowledge. Covers tensors, neural networks, transformers, attention, the KV cache, GPU bandwidth and cache hierarchy, quantisation and GGUF, PCA / SVD / Eckart‑Young, manifolds and intrinsic dimension, and ends with one plain summary per paper plus a vocabulary cheat-sheet. Read this first if any term in the seven papers is unfamiliar.
A weight-geometry-only PCA basis reduces attention $Q/K/V$ rank from $d{=}4096$ to $k{=}1024$ and produces a measured 106.27% of baseline decode throughput on a single consumer GPU, statistically significant at $p \approx 10^{-10}$. We argue (with caveats) that the speedup comes from the attention working set fitting in L2.
The full multi-slot Geodesic Projection (GP) compression scheme that the
geodessical runtime ships with: per-layer PCA bases for $Q/K/V/O$,
FFN up/gate, and a dedicated SVD path for FFN down. Includes the
persistent geometry cache, the depth-sink shortcut, and the deployment
envelope on a single 8 GB GPU.
How a compressed-manifold model serves as a draft generator for speculative decoding against the full-precision transformer, and how Attention Residuals (AttnRes, Kimi Team 2026) compose with the compression. v0.3 anchors the speculative path with a first end-to-end measurement on SmolLM2-135M-Instruct: 38.5% acceptance, 76.5 tok/s, status=geodesic_ready, plus the instruct-greedy-EOS pathology and its fix.
A theory paper now partially anchored by real LM-manifold measurements: Riemannian latent-space inference, Geodesic Trajectory Caching (GTC), and the OTT program. The universal diffeomorphism construction remains open, but the current repository's OTT deployment manifolds have a narrower, certificate-backed closure.
Optional 3-phase LoRA distillation (r=8, 500 AdamW steps) recovers up to 107% of the PPL gap from GRC compression. First-order bound on recoverable energy via ρ ratio. Three merge strategies. Per-matrix SVD: +79.4% Frobenius error reduction over shared-basis at k=256.
Full 5-benchmark sweep measuring GRC's impact on task performance. Factual recall (ARC, HellaSwag) nearly unchanged at k=1024. Math reasoning (GSM8K) degrades moderately. MMLU shows −2.5pp. k=1536 near-baseline on all tasks.
Empirical companion to Paper 4. Fits Riemannian structure on three LM activation manifolds, validates the Jacobi-correction contract (97× at $B{=}10$, $30.9$ µs/q lookup, $5.96$ KB/record), and anchors the OTT runtime end-to-end at 38.5% acceptance / 76.5 tok/s on SmolLM2-135M-Instruct. 12 of 17 Paper 4 testable claims now have a replicable measured result.
Four mechanisms shipped in the runtime that target distinct failures of static rank-$r$ SVD: MCR phase-aware per-layer rank allocation with attention-sink bypass, Axiom Gauge diagonal-$g$ optimisation over the GL($d$) residual-stream symmetry (zero inference overhead), Thermal Rank NVML-driven rank scaling with a tokens-per-joule policy gradient, and Online Basis Oja's-rule updates fired by speculative rejections. Implementation real and CLI-reachable; first measurements deferred to v0.2.
Per-cluster SVD on FFN columns recovers 21-25% reconstruction error vs. global SVD. Activation-weighted SVD: 22.7× better than weight-norm proxy. LoRA recovery at 99.9% gap closure. Honest about proxy failure: reconstruction error does not predict PPL.
Empirical validation of the Geodesic Trajectory Cache from Paper IV. 97× batched-Jacobi speedup at B=10. Cache hit rates 90.4-91.5% scale-invariant across 135M-1.5B parameters. 15.5× faster than disk-based RAG.
Architecture-independent formula $k^* = \mathrm{L2\_MB} \times 42.7$ predicts optimal GRC rank from L2 cache size alone. Validated on RTX 4070, A10G, L40S. 150 test cases, 100% accuracy. SOLVED 100%.
Surgical model grafting via gauge-aligned manifold projection. Within-band (adjacent layers) viable: GD<0.92, gauge +74%. Cross-band infeasible. 7 Danish chimeras — 5 of 7 improve MMLU. Cross-model grafting confirmed.
Standardised coordinate system for transformer representations. Enables component hot-swap between independently trained models. Zone-based knowledge routing with algebraic encoding.
Training transformer components directly in a k-dimensional manifold. RiemannianAdamW with QR retraction on the Grassmann manifold. <15% trainable parameters.
Geometric safety via orthogonal projection onto a safe subspace. 0% harmful activation by mathematical construction. Multi-step concept refinement chains.
Precision removal of behavioral coordinates from the UGT manifold. Per-category sniping with minimal collateral damage. Integrated with COG pipeline.
Living model manifold that grows through interaction. Geometric harmful-content detection. .MIKU persistence format. The unified living model (ISAGI). Geometric jury: universal aggregation formula solving open framework problems.
Mathematically proven universal mechanism for aggregating independent geometric trials. 8 theorems. Applications across all 15 papers. Solves framework optimization problems at 9-7200x speedup.
Neural embedder learns the involution ι(s)=1-s. Zeros on critical line are fixed points of the learned involution. Off-critical deviation 0.81. Faithfulness gap: error 0.009, limit convergence unproven.
5-step pipeline composing all HyperTensor technologies for RH verification. 105/105 zeros detected. Meta-jury 100% accuracy. Protocol specification for analytic number theorists.
8 theorems with complete proofs. The jury formula $J = 1 - \prod(1-c_i)$ is the unique aggregation rule. Instinct horizon $d_h = R \cdot (-\ln(1 - 0.5^{1/N}))$. J-decay table. 177× speedup: 0.17ms jury vs 30ms transformer.
Complete mathematical handoff. The one theorem: $\zeta(s)=0 \implies D(s)=0$. Everything else — feature map, Z₂ framework, computational verification at $3.04\times 10^9\times$ separation — is done.
Installable Python packages from the HyperTensor ecosystem. Available on PyPI.
Riemannian geometry for transformer compression, speculative decoding, safety, and the Riemann Hypothesis.
pip install hypertensor-framework
O(1) instant sort via Riemannian Comparison Manifold. Cross-language: Python, JavaScript, Java.
pip install hypersort
Geometric core: Riemannian metrics, hallucination guards, geodesic trajectory analysis. import hypercore
pip install hypertensor-core
JavaScript/TypeScript package. GitHub Packages registry.
npm install @nagusamecs/hypersort
Real-time visual comparison: HyperSort vs QuickSort vs MergeSort vs Native.