Reproduce E · Distillation

Reproduce Paper E: Light Distillation for Calibration-Permitted GRC

William Ken Ohara Stewart (NagusameCS Independent Research)

HyperTensor Project · April 2026 · Paper E (HTML) · Paper E (PDF) · scripts

Status

Paper E is a scaffold companion to Paper A. The Phase 1 reference implementation (calibration-free GRC projection in numpy) is functional and ships in scripts/grc_distill.py. Phase 2 (LoRA-residual fit on Q/K/V against teacher logit MSE) requires a torch + GPU runtime and is not yet end-to-end. Phase 3 (re-quantise and ship a single GGUF) delegates to llama.cpp.

Hardware target

Quick start (Phase 1, CPU-only)

# 1. From the repo root
python -m venv .venv
.venv\Scripts\activate
pip install numpy

# 2. Run the calibration-free reference (k=1024, no sink-channel)
python scripts/grc_distill.py \
  --model models/Llama-3.1-8B-Instruct-Q4_K_M.gguf \
  --rank 1024 \
  --out distill_out/

# 3. Inspect the manifest
type distill_out\distill_manifest.json

Phase 2 (GPU, calibration-permitted)

# Requires PyTorch + transformers; runs on EC2 A100/L40S
python scripts/grc_distill.py \
  --model models/Llama-3.1-8B-Instruct-Q4_K_M.gguf \
  --rank 1024 \
  --distill \
  --corpus calibration/wikitext_calib_2k.txt \
  --lora-rank 8 \
  --steps 500 \
  --device cuda \
  --out distill_out/

Expected outputs

Caveats