Entry tags:
Let's understand Large Language Models better
This is a good starting point:
"A Mathematical Framework for Transformer Circuits", Dec 2021
transformer-circuits.pub/2021/framework/index.html
"A Mathematical Framework for Transformer Circuits", Dec 2021
transformer-circuits.pub/2021/framework/index.html
no subject
45:34
MLP neurons as four times the residual stream width I don't know why but
45:39
everyone does it so you just memorize the number four"