Interpretability · Northeastern University
Position-Aware Circuit Discovery for Language Models
This work fixes a blind spot in automatic circuit discovery: model components can matter at specific token positions, so position-invariant circuits miss real mechanisms.