Dataflow matrix machines (by Anhinga anhinga) (
dmm
) wrote
2023-08-22 04:26 pm (UTC)
no subject
"um side note basically every Transformer I've come across they just hard code the number of
45:34
MLP neurons as four times the residual stream width I don't know why but
45:39
everyone does it so you just memorize the number four"
(
27 comments
)
Post a comment in response:
From:
Anonymous
This account has disabled anonymous posting.
OpenID
Identity URL:
Log in?
Dreamwidth account
Account name
Password
Log in?
If you don't have an account you can
create one now
.
Subject
HTML doesn't work in the subject.
Formatting type
Casual HTML
Markdown
Raw HTML
Rich Text Editor
Message
[
Home
|
Post Entry
|
Log in
|
Search
|
Browse Options
|
Site Map
]
no subject
45:34
MLP neurons as four times the residual stream width I don't know why but
45:39
everyone does it so you just memorize the number four"