Reproduce C · OTT-Decode

Reproduce Paper C: Geodesic speculative decoding

William Ken Ohara Stewart (NagusameCS Independent Research)

HyperTensor Project · April 2026 · Paper C (HTML) · Paper C (PDF) · repro tree

Scope

Reproduces the OTT-aware verifier, the EOS-aware acceptance protocol and the partial T_V(k) sweep. The full T_V(k) sweep is EC2-bound because the cold weight-PCA cache for k below 256 exceeds the RTX 4070 Laptop wall-time budget on Llama-3.1-8B.

Hardware target

Prerequisites

1. Headline acceptance rate

.\build_host\geodessical.exe $model `
    --axex-spec --axex-spec-tree-k 4 --axex-spec-target-acc 0.4 `
    -p "Write a sorting algorithm in Python" -n 256 --temp 0

Expected: about 38.5 percent acceptance under the EOS-aware schedule.

2. T_V(k) partial sweep

python scripts\bench_tv_of_k.py

The script writes docs/figures/paper-c/tv_of_k.csv. The local run produces a partial curve; alpha collapses to 0 percent at k at or below 256 because of cold-cache PCA. This is reported as a hardware-specific artefact in section 6 of the paper, not as a mechanism finding.

3. EC2: full T_V(k) sweep

python scripts/bench_tv_of_k.py

On the L40S the cold-PCA cost drops below the timeout, so the k=128 and k=256 points populate. Expected total wall time is about 90 minutes.

Outputs

Tolerances