Paper F / VII: Task-Level Impact Analysis of Low-Rank Compression

NagusameCS · April 2026 · Part of HyperTensor Papers I-X

8 tasksevaluated

k=256-2048rank range

Llama-3.1-8Bmodel

Abstract

Low-rank compression of attention weights reduces the parameter count of a transformer, but the impact on downstream task performance is not uniform. Paper VII measures the per-task degradation curve for 8 common benchmarks (MMLU, HellaSwag, ARC-Challenge, PIQA, WinoGrande, GSM8K, TruthfulQA, OpenBookQA) under GRC compression at k = 256, 512, 768, 1024, 1536, and 2048 on Llama-3.1-8B-Instruct. We find that reasoning-heavy tasks (GSM8K, ARC-Challenge) degrade faster than fact-retrieval tasks (MMLU, TruthfulQA) as rank decreases, and that the degradation follows a power law in k with task-specific exponents. A single task (PIQA) shows no significant degradation even at k=256, suggesting that certain capabilities are robust to compression because they depend on shallow, low-rank features. We provide a task-level impact matrix that allows practitioners to choose a rank based on their accuracy budget per downstream task.

Key Findings

Power-law degradation: Task accuracy follows acc(k) = acc(inf) - c * k^{-alpha} with task-specific alpha (0.3 to 1.1).
PIQA immunity: PIQA accuracy is within 0.5% of baseline at all tested ranks, suggesting commonsense physical reasoning is encoded in very low-rank features.
GSM8K vulnerability: Mathematical reasoning degrades fastest (alpha ≈ 1.1), consistent with the hypothesis that multi-step reasoning requires higher-rank attention subspaces.
Task impact matrix: A lookup table mapping (task, rank) to expected accuracy, usable as a design tool for deployment trade-offs.

Measured Results

Task	k=2048	k=1536	k=1024	k=768	k=512	k=256
MMLU	65.2%	64.8%	63.1%	61.4%	58.2%	52.1%
HellaSwag	78.9%	78.2%	76.5%	74.1%	70.3%	64.8%
ARC-Challenge	54.3%	53.1%	50.2%	47.5%	42.8%	36.1%
PIQA	80.1%	80.0%	79.8%	79.7%	79.5%	79.2%
WinoGrande	73.4%	72.8%	71.2%	68.9%	64.5%	58.3%
GSM8K	42.1%	38.5%	31.2%	25.8%	18.4%	9.7%
TruthfulQA	54.8%	53.9%	51.7%	49.2%	45.1%	39.6%
OpenBookQA	44.2%	43.1%	40.8%	38.0%	33.5%	27.4%

Download PDF LaTeX source Raw benchmark data