Dataflow matrix machines (by Anhinga anhinga) (
dmm
) wrote
2023-10-30 05:04 am (UTC)
no subject
~ 1:17:30 not only does attention move info from residual stream of one token to another, then information accumulated from multiple residual streams of many tokens can be moved again (combining aggregation and compositionality)
(
27 comments
)
Post a comment in response:
From:
Anonymous
This account has disabled anonymous posting.
OpenID
Identity URL:
Log in?
Dreamwidth account
Account name
Password
Log in?
If you don't have an account you can
create one now
.
Subject
HTML doesn't work in the subject.
Formatting type
Casual HTML
Markdown
Raw HTML
Rich Text Editor
Message
[
Home
|
Post Entry
|
Log in
|
Search
|
Browse Options
|
Site Map
]
no subject