Reproduce Paper B: Geodesic Projection pipeline, HyperTensor Research

Scope

Reproduces the five-arm ablation, the per-layer rank sweep and the MCR null re-run. The full pack is most reliably run on a g6e.xlarge L40S instance for the FFN-down SVD step, which exceeds the consumer GPU memory budget.

Hardware target

Recommended: g6e.xlarge (L40S, 48 GB VRAM, about 2.25 USD per hour for the headline 70-minute run).
Local headline rows only: any 8 GB CUDA GPU.

Prerequisites

Same toolchain as Paper A (Zig CC, CUDA 12.x, PowerShell or bash).
Model file Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf.
For EC2: AWS credentials, the launch script under scripts/ec2_paperB_ablations/launch_detached.ps1.

1. Local: MCR null re-run

.\.venv\Scripts\python.exe scripts\bench_mcr_ablation.py *> docs\figures\paper-b\mcr_ablation_run2_log.txt
Get-Content docs\figures\paper-b\mcr_ablation_summary.json

Expected: baseline 39.28 plus or minus 0.05 tok/s, MCR 39.23 plus or minus 0.05 tok/s (delta -0.13 percent, 1x variance ratio). If the variance ratio is much greater than 5x or absolute throughput is below 38 tok/s, you have background-load contamination. Close any co-resident GPU processes and rerun.

2. EC2: full five-arm ablation

cd <repo_root>
.\scripts\ec2_paperB_ablations\launch_detached.ps1 `
    -InstanceType g6e.xlarge `
    -MaxRuntimeMinutes 120

The launcher is disconnect-resilient. It uploads the model, configures the L40S, runs all five arms (baseline, compress, compress_gauge, compress_online, compress_spec), and tears the instance down on completion. Total wall time is about 70 minutes; total cost is about 2.25 USD.

3. Outputs

docs/figures/paper-b/mcr_ablation.csv (clean run).
docs/figures/paper-b/mcr_ablation_run1_contaminated.csv (preserved for audit).
benchmarks/paperB_ablation_l40s_* (EC2 pack).

Tolerances

Throughput: plus or minus 5 percent.
PPL: deterministic to four decimal places.
The 21x variance ratio in the original contaminated run was the contamination signal. A clean re-run sits at 1x to 1.5x.

What can go wrong

Co-resident GPU workload eats about 3.5 percent throughput on both arms and inflates variance.
EC2 launch fails if the AMI ID has rotated; check the launcher for the current pinned AMI.