� Quick Start — The ht-repro CLI
The easiest way to reproduce results. One command per paper, or run everything at once. NEW
$ python scripts/ht_repro.py all-t1 # All CPU-only tests (~30 min)
$ python scripts/ht_repro.py paper-1 # Reproduce Paper I (GRC attention)
$ python scripts/ht_repro.py jury # All jury theorem verification
$ python scripts/ht_repro.py riemann # All Riemann Hypothesis verification
$ python scripts/ht_repro.py list # Show all 16 available tests
$ python scripts/ht_repro.py summary # Print verified results summary
$ python -m ht_repro serve # Start localhost web UI (http://localhost:8765)
$ python -m ht_repro tools # 60+ utility tools: graft, bench, train, compress, GTC, safety, UGT, models
$ python -m ht_repro tools models token-setup # Configure HuggingFace token
Windows shortcut from repo root: ht-repro smoke | Source: scripts/ht_repro.py
� 60-Second Smoke Test (CPU-only, no GPU, no model download)
If you run nothing else, run this. It verifies the core mathematics of Papers XVI–XVIII (Riemann Hypothesis) in under 60 seconds on any laptop. VERIFIED 2026-05-13
$ python scripts/faithfulness_rigorous.py
Expected output (verified): SV1=8.944272, SV2..SV12=0.000000
Also confirms: Z₂ symmetry EXACT (not approximate) at t=14.1, 100, 1000, 10000, 100000. Error at k=12: 0.0000000000. Power law: error ~ k⁻⁵²·²⁹.
Hardware Tiers
T1 CPU-only, any laptop
No GPU, no model download needed. Pure Python + NumPy. ~12 papers verifiable in ~30 min total.
T2 Consumer GPU (8+ GB VRAM)
RTX 4070 Laptop or better. Model download required. Papers I–IV, VII–X verifiable. ~2 hours total.
T3 Datacenter GPU (24+ GB VRAM)
L40S, A100, H100. Full-scale runs. Papers XI–XV at scale. Pre-computed reference outputs available if you don't have T3 hardware.
Reproduction by Paper
Jury Proof — 8 Theorems
J = 1−∏(1−cᵢ) derivation, instinct horizon, J-decay table.
→ Full guideGRC Attention Compression
106.27% baseline throughput at k=1024. NCU L2 trace. 12-rep CI pack.
→ Full guideGeodesic Projection Pipeline
7-slot compression, MCR allocation, depth-sink shortcut, geometry cache.
→ Full guideGeodesic Speculative Decoding
38.5% acceptance, 76.5 tok/s, EOS-pathology fix, AttnRes composition.
→ Full guideOrganic Training Theory (OTT)
97× batched Jacobi, 4-model TwoNN, curvature warp, GTC manifold.
→ Full guideGRC Light Distillation
LoRA recovers 107% PPL gap. Phase 1: CPU-only (~60s). Phase 2: EC2 A100.
→ Full guideTask-Level Impact
8-task PPL sweep k=256–1024. Validates UGT zone-specialisation.
→ Full guideFFN Cluster Compression
Per-cluster SVD on FFN. 22.6% error improvement. CPU-only, ~60s.
→ Full guideGTC vs RAG
100K-trajectory simulation. 15.5× over FAISS. Pure Python, ~2 min. Verified: 30.9 µs/q, 5.96 KB/record.
→ Full guideCross-GPU Transfer
k* = L2_MB × 42.7 validated. Analytical simulator + measured sweep.
→ Full guideCECI Model Grafting
FFN hot-swap between UGT-trained models. 7/7 layers pass (ΔPPL=−0.11).
→ Full guideUniversal Geodesic Taxonomy
Bilateral subspace overlap 0.9999. Zone classification. Wielandt-Hoffman proof.
→ Full guideNative Geodesic Training
RiemannianAdamW on Stiefel. k=768: 26% params, 34.5% variance.
→ Full guideSafe OGD
0% TEH by construction. Q_fᵀ·P_safe = 0. CPU-only. Verified: arithmetic_mean & max_loss_DRO confirm.
→ Full guideBehavioral Snipe
8 categories probed. Greedy + 2% budget = 7.4× better than all-snipe.
→ Full guideCOG + TEH (Living Model)
TEH detection 93.8–100%. 10K-interaction COG. .MIKU format.
→ Full guideAGT — Zeta Zero Topology
100% off-critical detection at 50K primes. k90=k95=1. 800× separation. Verified: 8,000 primes, rank-1 exactly, Z₂ exact.
→ Full guideACM — Analytic Continuation Manifold
Learned involution ι²≈id (ε=0.009). 14/15 off-critical detected.
→ Full guideBridge Protocol
5-step pipeline. 105/105 zeros. Meta-jury 100%. Pearson r=1.0000. Verified: 54,949 LMFDB zeros, 100% on critical, TPR=1.0 FPR=0.0.
→ Full guideFull Reproduction Checklist (T1 — CPU only, ~30 min)
Run these in order. All verified 2026-05-13 on Python 3.12, Windows 11, no GPU.
- Smoke test:
python scripts/faithfulness_rigorous.py
→ SV1=8.944272, SV2..SV12=0.000000, Z₂ exact, error=0 at k=12 - Jury proof:
python scripts/jury_final.py
→ 8 theorems verify, J = 1−∏(1−cᵢ), ensemble of 3 temperatures - Jury horizon:
python scripts/jury_horizon.py
→ J-decay table, instinct horizon d_h derived, knowledge boundary confirmed - Jury scaling:
python scripts/jury_scaling.py
→ 174× speedup at 128 jurors, 153× at 512 jurors (vs ~30ms transformer) - Riemann meta-jury:
python scripts/validate_riemann_lmfdb.py
→ 54,949 LMFDB zeros: 100% on critical line, TPR=1.0, FPR=0.0 - Safe OGD:
python scripts/verify_safe_loss_aczel.py
→ arithmetic_mean=True, max_loss_DRO=True, 0% forbidden leakage - GTC vs RAG:
python scripts/gtc_vs_rag.py
→ 30.9 µs/q GTC lookup, 5.96 KB/record, ~16× faster than ANN - BP/NS Bound:
python scripts/verify_bp_ns_bound.py
→ 160/160 trials pass, avg tightness ratio=0.425 - Jury ensemble:
python scripts/jury_ensemble.py
→ Regression beats classification on all tasks (CECI MAE=0.11, OGD MAE=0.13, COG MAE=0.11) - AGT v3 (zeta zeros):
python scripts/agt_v3.py
→ 59/60 detection (98%), 0 FP, 1392× separation, k90=k95=1, SV=[105,0,0,0,0,0,0,0] - Behavioral Residue:
python scripts/verify_behavioral_residue_invariant.py
→ Layers 0–22 hold (ratio 1.78–2.92), layer 29 fails (ratio=0.79). Needs investigation. - Riemann ACM:
python scripts/acm_prototype.py(Needs torch) - Bilateral UGT:
python scripts/bilateral_ugt.py(Needs GPU + model) - GRC Light Distill:
python scripts/grc_distill.py --phase1-only(Needs model) - Jury ensemble:
python scripts/jury_ensemble.py
→ Regression beats classification on all tasks (CECI MAE=0.11, OGD MAE=0.13, COG MAE=0.11) - AGT v3:
python scripts/agt_v3.py
→ 59/60 detection (98%), 0 FP, 1392× separation, k90=k95=1, SV=[105,0,0,0,0,0,0,0]
Pre-Computed Reference Outputs
Don't have T2/T3 hardware? We ship pre-computed JSON reference files for every paper that needs GPU. Compare your local T1 results against the reference:
benchmark_decode_nopipe.json ccm_v4_results.json
grc_benchmark_stub.json task_level_impact.json
... (30+ files)
Each file contains the exact expected numbers from the paper's headline claims, run on the reference GPU (RTX 4070 Laptop for T2, EC2 L40S for T3).
Troubleshooting
Q: "ModuleNotFoundError: No module named 'xxx'"
Install all dependencies at once:
For CPU-only (T1), skip torch bitsandbytes — most T1 scripts only need NumPy + SciPy.
Q: "CUDA out of memory" on T2
Use the Q4_K_M quantized GGUF (not the full FP16 model):
This fits in 8 GB VRAM with room for KV cache and compressed weights.
Q: "I don't have a GPU at all"
You can still reproduce ~60% of all claims on T1 (CPU-only): Papers V (phase 1), VII, VIII, IX, XIII, XVI, XVII, XVIII, Foundation, and all jury/horizon/scaling scripts. Use the pre-computed reference JSONs to compare.
Q: "Numbers don't match exactly"
Minor variations (±2%) are expected due to GPU microarchitecture differences. The reference GPU is RTX 4070 Laptop (AD106, 8 GB). On an L40S or A100 you'll see different absolute tok/s but the same relative speedup pattern. Check repro/HARDWARE.md for cross-GPU correction factors.