dmm: (Default)
Dataflow matrix machines (by Anhinga anhinga) ([personal profile] dmm) wrote 2023-08-22 04:26 pm (UTC)

"um side note basically every Transformer I've come across they just hard code the number of
45:34
MLP neurons as four times the residual stream width I don't know why but
45:39
everyone does it so you just memorize the number four"

Post a comment in response:

This account has disabled anonymous posting.
If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting