Paper F / VII: Task-Level Impact Analysis of Low-Rank Compression

NagusameCS · April 2026 · Part of HyperTensor Papers I-X
8 tasksevaluated
k=256-2048rank range
Llama-3.1-8Bmodel

Abstract

Low-rank compression of attention weights reduces the parameter count of a transformer, but the impact on downstream task performance is not uniform. Paper VII measures the per-task degradation curve for 8 common benchmarks (MMLU, HellaSwag, ARC-Challenge, PIQA, WinoGrande, GSM8K, TruthfulQA, OpenBookQA) under GRC compression at k = 256, 512, 768, 1024, 1536, and 2048 on Llama-3.1-8B-Instruct. We find that reasoning-heavy tasks (GSM8K, ARC-Challenge) degrade faster than fact-retrieval tasks (MMLU, TruthfulQA) as rank decreases, and that the degradation follows a power law in k with task-specific exponents. A single task (PIQA) shows no significant degradation even at k=256, suggesting that certain capabilities are robust to compression because they depend on shallow, low-rank features. We provide a task-level impact matrix that allows practitioners to choose a rank based on their accuracy budget per downstream task.

Key Findings

Measured Results

Taskk=2048k=1536k=1024k=768k=512k=256
MMLU65.2%64.8%63.1%61.4%58.2%52.1%
HellaSwag78.9%78.2%76.5%74.1%70.3%64.8%
ARC-Challenge54.3%53.1%50.2%47.5%42.8%36.1%
PIQA80.1%80.0%79.8%79.7%79.5%79.2%
WinoGrande73.4%72.8%71.2%68.9%64.5%58.3%
GSM8K42.1%38.5%31.2%25.8%18.4%9.7%
TruthfulQA54.8%53.9%51.7%49.2%45.1%39.6%
OpenBookQA44.2%43.1%40.8%38.0%33.5%27.4%
Download PDF LaTeX source Raw benchmark data