![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
This is a good starting point:
"A Mathematical Framework for Transformer Circuits", Dec 2021
transformer-circuits.pub/2021/framework/index.html
"A Mathematical Framework for Transformer Circuits", Dec 2021
transformer-circuits.pub/2021/framework/index.html
no subject
Date: 2023-08-22 02:58 pm (UTC)via https://twitter.com/NeelNanda5/status/1580782930304978944
no subject
Date: 2023-08-22 03:55 pm (UTC)29:36
um or model functionality as a sum of paths via the residual stream notion"
no subject
Date: 2023-08-22 04:20 pm (UTC)40:32
actually like the words reading and writing here because I think they can be pretty misleading but in particular
40:38
reading and writing intuitively feel like inverses or complementary operations but they're actually very
40:44
different so I prefer the word um project for read and embed for write"
no subject
Date: 2023-08-22 04:26 pm (UTC)45:34
MLP neurons as four times the residual stream width I don't know why but
45:39
everyone does it so you just memorize the number four"