Scope
Reproduces the five-arm ablation, the per-layer rank sweep and the MCR null re-run. The full pack is most reliably run on a g6e.xlarge L40S instance for the FFN-down SVD step, which exceeds the consumer GPU memory budget.
Hardware target
- Recommended: g6e.xlarge (L40S, 48 GB VRAM, about 2.25 USD per hour for the headline 70-minute run).
- Local headline rows only: any 8 GB CUDA GPU.
Prerequisites
- Same toolchain as Paper A (Zig CC, CUDA 12.x, PowerShell or bash).
- Model file
Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf. - For EC2: AWS credentials, the launch script under
scripts/ec2_paperB_ablations/launch_detached.ps1.
1. Local: MCR null re-run
.\.venv\Scripts\python.exe scripts\bench_mcr_ablation.py *> docs\figures\paper-b\mcr_ablation_run2_log.txt
Get-Content docs\figures\paper-b\mcr_ablation_summary.json
Expected: baseline 39.28 plus or minus 0.05 tok/s, MCR 39.23 plus or minus 0.05 tok/s (delta -0.13 percent, 1x variance ratio). If the variance ratio is much greater than 5x or absolute throughput is below 38 tok/s, you have background-load contamination. Close any co-resident GPU processes and rerun.
2. EC2: full five-arm ablation
cd <repo_root>
.\scripts\ec2_paperB_ablations\launch_detached.ps1 `
-InstanceType g6e.xlarge `
-MaxRuntimeMinutes 120
The launcher is disconnect-resilient. It uploads the model, configures the L40S, runs all five arms (baseline, compress, compress_gauge, compress_online, compress_spec), and tears the instance down on completion. Total wall time is about 70 minutes; total cost is about 2.25 USD.
3. Outputs
docs/figures/paper-b/mcr_ablation.csv(clean run).docs/figures/paper-b/mcr_ablation_run1_contaminated.csv(preserved for audit).benchmarks/paperB_ablation_l40s_*(EC2 pack).
Tolerances
- Throughput: plus or minus 5 percent.
- PPL: deterministic to four decimal places.
- The 21x variance ratio in the original contaminated run was the contamination signal. A clean re-run sits at 1x to 1.5x.
What can go wrong
- Co-resident GPU workload eats about 3.5 percent throughput on both arms and inflates variance.
- EC2 launch fails if the AMI ID has rotated; check the launcher for the current pinned AMI.