Reproduce Paper E: Light Distillation, HyperTensor Research

Status

Paper E is a scaffold companion to Paper A. The Phase 1 reference implementation (calibration-free GRC projection in numpy) is functional and ships in scripts/grc_distill.py. Phase 2 (LoRA-residual fit on Q/K/V against teacher logit MSE) requires a torch + GPU runtime and is not yet end-to-end. Phase 3 (re-quantise and ship a single GGUF) delegates to llama.cpp.

Hardware target

Phase 1: any x86_64 host with Python 3.11+ and 16 GB RAM. CPU only. Runs in 60-120 s for an 8B model.
Phase 2: 1x A100/L40S/H100 with 24 GB VRAM and PyTorch 2.4. Estimated wall clock for 500 steps at LoRA r=8: 30-60 min on A100.
Phase 3: same host as Phase 2, plus a working llama.cpp build with the quantize tool.

Quick start (Phase 1, CPU-only)

# 1. From the repo root
python -m venv .venv
.venv\Scripts\activate
pip install numpy

# 2. Run the calibration-free reference (k=1024, no sink-channel)
python scripts/grc_distill.py \
  --model models/Llama-3.1-8B-Instruct-Q4_K_M.gguf \
  --rank 1024 \
  --out distill_out/

# 3. Inspect the manifest
type distill_out\distill_manifest.json

Phase 2 (GPU, calibration-permitted)

# Requires PyTorch + transformers; runs on EC2 A100/L40S
python scripts/grc_distill.py \
  --model models/Llama-3.1-8B-Instruct-Q4_K_M.gguf \
  --rank 1024 \
  --distill \
  --corpus calibration/wikitext_calib_2k.txt \
  --lora-rank 8 \
  --steps 500 \
  --device cuda \
  --out distill_out/

Expected outputs

distill_out/distill_manifest.json: configuration record + stage marker (scaffold, calibration_free, or distill_requested).
Pending (Phase 2): per-layer LoRA residual weights and a final perplexity delta against the un-distilled GRC k=1024 baseline. Expected: closure of approximately half of the 61.4% PPL gap, hitting roughly +30% over the uncompressed baseline at unchanged decode throughput.

Caveats

The empirical Phase-2 results in Paper E are explicitly listed as Pending until the EC2 A100 runner is functional.
The Wikitext-2 calibration corpus citation in the paper triggers a pandoc warning; this is a known harmless reference issue and does not affect the rendered PDF or HTML.