![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
This is a good starting point:
"A Mathematical Framework for Transformer Circuits", Dec 2021
transformer-circuits.pub/2021/framework/index.html
"A Mathematical Framework for Transformer Circuits", Dec 2021
transformer-circuits.pub/2021/framework/index.html
no subject
Date: 2023-08-22 04:26 pm (UTC)45:34
MLP neurons as four times the residual stream width I don't know why but
45:39
everyone does it so you just memorize the number four"