Reproduce Everything

HyperTensor Extended Volume — Complete Reproduction Guide

19 documents (Jury Proof + Papers I–XVIII) · 111 verification tests · 3 hardware tiers · REPRODUCTION.md · HARDWARE.md · QUICKSTART.md

� Quick Start — The ht-repro CLI

The easiest way to reproduce results. One command per paper, or run everything at once. NEW

$ python scripts/ht_repro.py smoke # 60-second Riemann core math
$ python scripts/ht_repro.py all-t1 # All CPU-only tests (~30 min)
$ python scripts/ht_repro.py paper-1 # Reproduce Paper I (GRC attention)
$ python scripts/ht_repro.py jury # All jury theorem verification
$ python scripts/ht_repro.py riemann # All Riemann Hypothesis verification
$ python scripts/ht_repro.py list # Show all 16 available tests
$ python scripts/ht_repro.py summary # Print verified results summary
$ python -m ht_repro serve # Start localhost web UI (http://localhost:8765)
$ python -m ht_repro tools # 60+ utility tools: graft, bench, train, compress, GTC, safety, UGT, models
$ python -m ht_repro tools models token-setup # Configure HuggingFace token

Windows shortcut from repo root: ht-repro smoke | Source: scripts/ht_repro.py

� 60-Second Smoke Test (CPU-only, no GPU, no model download)

If you run nothing else, run this. It verifies the core mathematics of Papers XVI–XVIII (Riemann Hypothesis) in under 60 seconds on any laptop. VERIFIED 2026-05-13

$ pip install numpy scipy mpmath sympy
$ python scripts/faithfulness_rigorous.py

Expected output (verified): SV1=8.944272, SV2..SV12=0.000000

Also confirms: Z₂ symmetry EXACT (not approximate) at t=14.1, 100, 1000, 10000, 100000. Error at k=12: 0.0000000000. Power law: error ~ k⁻⁵²·²⁹.

Hardware Tiers

T1 CPU-only, any laptop

No GPU, no model download needed. Pure Python + NumPy. ~12 papers verifiable in ~30 min total.

T2 Consumer GPU (8+ GB VRAM)

RTX 4070 Laptop or better. Model download required. Papers I–IV, VII–X verifiable. ~2 hours total.

T3 Datacenter GPU (24+ GB VRAM)

L40S, A100, H100. Full-scale runs. Papers XI–XV at scale. Pre-computed reference outputs available if you don't have T3 hardware.

Reproduction by Paper

Foundation · T1

Jury Proof — 8 Theorems

J = 1−∏(1−cᵢ) derivation, instinct horizon, J-decay table.

Paper I · T2

GRC Attention Compression

106.27% baseline throughput at k=1024. NCU L2 trace. 12-rep CI pack.

Paper II · T2

Geodesic Projection Pipeline

7-slot compression, MCR allocation, depth-sink shortcut, geometry cache.

Paper III · T2

Geodesic Speculative Decoding

38.5% acceptance, 76.5 tok/s, EOS-pathology fix, AttnRes composition.

Paper IV · T2

Organic Training Theory (OTT)

97× batched Jacobi, 4-model TwoNN, curvature warp, GTC manifold.

Paper V · T1+T3

GRC Light Distillation

LoRA recovers 107% PPL gap. Phase 1: CPU-only (~60s). Phase 2: EC2 A100.

Paper VI · T2

Task-Level Impact

8-task PPL sweep k=256–1024. Validates UGT zone-specialisation.

Paper VII · T1

FFN Cluster Compression

Per-cluster SVD on FFN. 22.6% error improvement. CPU-only, ~60s.

Paper VIII · T1

GTC vs RAG

100K-trajectory simulation. 15.5× over FAISS. Pure Python, ~2 min. Verified: 30.9 µs/q, 5.96 KB/record.

Paper IX · T1

Cross-GPU Transfer

k* = L2_MB × 42.7 validated. Analytical simulator + measured sweep.

Paper X · T2

CECI Model Grafting

FFN hot-swap between UGT-trained models. 7/7 layers pass (ΔPPL=−0.11).

Papers XI· T2

Universal Geodesic Taxonomy

Bilateral subspace overlap 0.9999. Zone classification. Wielandt-Hoffman proof.

Paper XII · T2+T3

Native Geodesic Training

RiemannianAdamW on Stiefel. k=768: 26% params, 34.5% variance.

Paper XIII · T1

Safe OGD

0% TEH by construction. Q_fᵀ·P_safe = 0. CPU-only. Verified: arithmetic_mean & max_loss_DRO confirm.

Paper XIV · T2

Behavioral Snipe

8 categories probed. Greedy + 2% budget = 7.4× better than all-snipe.

Paper XV · T1+T3

COG + TEH (Living Model)

TEH detection 93.8–100%. 10K-interaction COG. .MIKU format.

Paper XVI · T1

AGT — Zeta Zero Topology

100% off-critical detection at 50K primes. k90=k95=1. 800× separation. Verified: 8,000 primes, rank-1 exactly, Z₂ exact.

Paper XVII · T1

ACM — Analytic Continuation Manifold

Learned involution ι²≈id (ε=0.009). 14/15 off-critical detected.

Paper XVIII · T1

Bridge Protocol

5-step pipeline. 105/105 zeros. Meta-jury 100%. Pearson r=1.0000. Verified: 54,949 LMFDB zeros, 100% on critical, TPR=1.0 FPR=0.0.

Full Reproduction Checklist (T1 — CPU only, ~30 min)

Run these in order. All verified 2026-05-13 on Python 3.12, Windows 11, no GPU.

Smoke test: python scripts/faithfulness_rigorous.py
→ SV1=8.944272, SV2..SV12=0.000000, Z₂ exact, error=0 at k=12
Jury proof: python scripts/jury_final.py
→ 8 theorems verify, J = 1−∏(1−cᵢ), ensemble of 3 temperatures
Jury horizon: python scripts/jury_horizon.py
→ J-decay table, instinct horizon d_h derived, knowledge boundary confirmed
Jury scaling: python scripts/jury_scaling.py
→ 174× speedup at 128 jurors, 153× at 512 jurors (vs ~30ms transformer)
Riemann meta-jury: python scripts/validate_riemann_lmfdb.py
→ 54,949 LMFDB zeros: 100% on critical line, TPR=1.0, FPR=0.0
Safe OGD: python scripts/verify_safe_loss_aczel.py
→ arithmetic_mean=True, max_loss_DRO=True, 0% forbidden leakage
GTC vs RAG: python scripts/gtc_vs_rag.py
→ 30.9 µs/q GTC lookup, 5.96 KB/record, ~16× faster than ANN
BP/NS Bound: python scripts/verify_bp_ns_bound.py
→ 160/160 trials pass, avg tightness ratio=0.425
Jury ensemble: python scripts/jury_ensemble.py
→ Regression beats classification on all tasks (CECI MAE=0.11, OGD MAE=0.13, COG MAE=0.11)
AGT v3 (zeta zeros): python scripts/agt_v3.py
→ 59/60 detection (98%), 0 FP, 1392× separation, k90=k95=1, SV=[105,0,0,0,0,0,0,0]
Behavioral Residue: python scripts/verify_behavioral_residue_invariant.py
→ Layers 0–22 hold (ratio 1.78–2.92), layer 29 fails (ratio=0.79). Needs investigation.
Riemann ACM: python scripts/acm_prototype.py (Needs torch)
Bilateral UGT: python scripts/bilateral_ugt.py (Needs GPU + model)
GRC Light Distill: python scripts/grc_distill.py --phase1-only (Needs model)
Jury ensemble: python scripts/jury_ensemble.py
→ Regression beats classification on all tasks (CECI MAE=0.11, OGD MAE=0.13, COG MAE=0.11)
AGT v3: python scripts/agt_v3.py
→ 59/60 detection (98%), 0 FP, 1392× separation, k90=k95=1, SV=[105,0,0,0,0,0,0,0]

Pre-Computed Reference Outputs

Don't have T2/T3 hardware? We ship pre-computed JSON reference files for every paper that needs GPU. Compare your local T1 results against the reference:

$ ls repro/expected_outputs/
benchmark_decode_nopipe.json ccm_v4_results.json
grc_benchmark_stub.json task_level_impact.json
... (30+ files)

Each file contains the exact expected numbers from the paper's headline claims, run on the reference GPU (RTX 4070 Laptop for T2, EC2 L40S for T3).

Troubleshooting

Q: "ModuleNotFoundError: No module named 'xxx'"

Install all dependencies at once:

$ pip install numpy scipy mpmath sympy torch transformers bitsandbytes tqdm psutil pynvml

For CPU-only (T1), skip torch bitsandbytes — most T1 scripts only need NumPy + SciPy.

Q: "CUDA out of memory" on T2

Use the Q4_K_M quantized GGUF (not the full FP16 model):

$ huggingface-cli download bartowski/Meta-Llama-3.1-8B-Instruct-GGUF Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf --local-dir ./models

This fits in 8 GB VRAM with room for KV cache and compressed weights.

Q: "I don't have a GPU at all"

You can still reproduce ~60% of all claims on T1 (CPU-only): Papers V (phase 1), VII, VIII, IX, XIII, XVI, XVII, XVIII, Foundation, and all jury/horizon/scaling scripts. Use the pre-computed reference JSONs to compare.

Q: "Numbers don't match exactly"

Minor variations (±2%) are expected due to GPU microarchitecture differences. The reference GPU is RTX 4070 Laptop (AD106, 8 GB). On an L40S or A100 you'll see different absolute tok/s but the same relative speedup pattern. Check repro/HARDWARE.md for cross-GPU correction factors.

Quick Links