![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
This is a good starting point:
"A Mathematical Framework for Transformer Circuits", Dec 2021
transformer-circuits.pub/2021/framework/index.html
"A Mathematical Framework for Transformer Circuits", Dec 2021
transformer-circuits.pub/2021/framework/index.html
no subject
Date: 2023-08-22 03:49 pm (UTC)no subject
Date: 2023-08-22 03:51 pm (UTC)no subject
Date: 2023-08-22 03:52 pm (UTC)footnote: "Some MLP neurons have very negative cosine similarity between their input and output weights, which may indicate deleting information from the residual stream. Similarly, some attention heads have large negative eigenvalues in their W_OW_V matrix and primarily attend to the present token, potentially serving as a mechanism to delete information. It's worth noticing that while these may be generic mechanisms for "memory management" deletion of information, they may also be mechanisms for conditionally deleting information, operating only in some cases."