Scope
Reproduces the OTT-aware verifier, the EOS-aware acceptance protocol and the partial T_V(k) sweep. The full T_V(k) sweep is EC2-bound because the cold weight-PCA cache for k below 256 exceeds the RTX 4070 Laptop wall-time budget on Llama-3.1-8B.
Hardware target
- Local headline acceptance rate: any 8 GB CUDA GPU.
- Full T_V(k) sweep: g6e.xlarge L40S, about 90 minutes wall.
Prerequisites
- Same toolchain as Paper A.
- Pre-warmed weight-PCA cache for the rank set you intend to sweep. The cache is regenerated per (model, k) pair and re-used. Cold-PCA at k=256 takes about 15 minutes on the reference GPU.
1. Headline acceptance rate
.\build_host\geodessical.exe $model `
--axex-spec --axex-spec-tree-k 4 --axex-spec-target-acc 0.4 `
-p "Write a sorting algorithm in Python" -n 256 --temp 0
Expected: about 38.5 percent acceptance under the EOS-aware schedule.
2. T_V(k) partial sweep
python scripts\bench_tv_of_k.py
The script writes docs/figures/paper-c/tv_of_k.csv. The
local run produces a partial curve; alpha collapses to 0 percent at k
at or below 256 because of cold-cache PCA. This is reported as a
hardware-specific artefact in section 6 of the paper, not as a
mechanism finding.
3. EC2: full T_V(k) sweep
python scripts/bench_tv_of_k.py
On the L40S the cold-PCA cost drops below the timeout, so the k=128 and k=256 points populate. Expected total wall time is about 90 minutes.
Outputs
docs/figures/paper-c/tv_of_k.csvdocs/figures/paper-c/spec_acceptance_pack/
Tolerances
- Acceptance rate: plus or minus 1.5 percentage points run-over-run on identical seeds.
- T_V(k) shape: monotone decreasing in k for k at or above 512 on Llama-3.1-8B.