● All 10 ablation experiments are complete. Here's the full summary:
● Full Ablation Results (mini_eval, 450 samples)
┌──────────────────────┬─────────┬───────────────────┬────────────────────┬───────────────────┬───────────────────────────┐ │ freeze_except │ Total % │ Delta vs Baseline │ Cryptarithm Deduce │ Cryptarithm Guess │ Baseline Decay │ ├──────────────────────┼─────────┼───────────────────┼────────────────────┼───────────────────┼───────────────────────────┤ │ shared_experts │ 68.0% │ 0.0% │ 8.0% │ 6.0% │ None (perfect preserve) │ ├──────────────────────┼─────────┼───────────────────┼────────────────────┼───────────────────┼───────────────────────────┤ │ out_proj │ 68.0% │ 0.0% │ 8.0% │ 6.0% │ None (perfect preserve) │ ├──────────────────────┼─────────┼───────────────────┼────────────────────┼───────────────────┼───────────────────────────┤ │ lm_head │ 67.6% │ -0.4% │ 8.0% │ 6.0% │ Minimal │ ├──────────────────────┼─────────┼───────────────────┼────────────────────┼───────────────────┼───────────────────────────┤ │ v_proj │ 65.1% │ -2.9% │ 0.0% │ 0.0% │ Moderate │ ├──────────────────────┼─────────┼───────────────────┼────────────────────┼───────────────────┼───────────────────────────┤ │ q_proj │ 57.6% │ -10.4% │ 0.0% │ 2.0% │ Severe (bit_manip 92→24%) │ ├──────────────────────┼─────────┼───────────────────┼────────────────────┼───────────────────┼───────────────────────────┤ │ down_proj │ 32.4% │ -35.6% │ 0.0% │ 0.0% │ Catastrophic │ ├──────────────────────┼─────────┼───────────────────┼────────────────────┼───────────────────┼───────────────────────────┤ │ up_proj │ 32.2% │ -35.8% │ 0.0% │ 0.0% │ Catastrophic │ ├──────────────────────┼─────────┼───────────────────┼────────────────────┼───────────────────┼───────────────────────────┤ │ in_proj │ 29.6% │ -38.4% │ 0.0% │ 0.0% │ Catastrophic │ ├──────────────────────┼─────────┼───────────────────┼────────────────────┼───────────────────┼───────────────────────────┤ │ o_proj │ 25.1% │ -42.9% │ 0.0% │ 0.0% │ Catastrophic │ ├──────────────────────┼─────────┼───────────────────┼────────────────────┼───────────────────┼───────────────────────────┤ │ none (all trainable) │ 27.1% │ -40.9% │ 2.0% │ 0.0% │ Catastrophic │ └──────────────────────┴─────────┴───────────────────┴────────────────────┴───────────────────┴───────────────────────────┘
Key Findings
- No target improved cryptarithm — all stay at or near 8%/6% baseline. Training 50 steps on 500 cryptarithm samples isn't enough to move the needle on any single target.
- Safest targets: shared_experts and out_proj — zero baseline decay.
- Moderate safety: lm_head (0.4% decay), v_proj (2.9% decay but kills cryptarithm).
- Catastrophic: in_proj, up_proj, down_proj, o_proj, none — all cause 30-43% total collapse.
- The real problem: 50 steps on 500 cryptarithm-only samples isn't enough to learn cryptarithm reasoning regardless of which layer is unfrozen. The model
needs either:
- Significantly more training steps or data
- A mix of module types (e.g., shared_experts + out_proj together) - Higher learning rate or different training schedule
- Training on a broader dataset that includes cryptarithm alongside other domains
● The ablation sweep is complete. The core finding is clear: layer choice doesn't fix cryptarithm — we need a fundamentally different training strategy (more data, more steps, or multi-target unfreezing). Ready for next steps when you are.