dmm: (Default)
Dataflow matrix machines (by Anhinga anhinga) ([personal profile] dmm) wrote 2023-10-30 05:43 am (UTC)

"1:30:18
we think you're grown up enough that you can figure out where it's useful to look and we're going to give you some
1:30:25
fraction of your premises I think it's like articles to attention something like uh
1:30:32
one-sixth of the parameters of the Transformer go to attention and we're like
1:30:37
these parameters to figure out where you should be moving information from what
1:30:42
does an intelligent worrying and an intelligent convolution look like and as we'll see later with induction
1:30:48
heads there can actually be like a pretty sophisticated and intelligent amount of computation
1:30:53
that goes into what this smart Dynamic convolution
1:30:58
looks like but yeah fundamentally attention is a generalized convolution where we allow
1:31:05
Transformers to compute how they ought to be moving information around for themselves"

Post a comment in response:

This account has disabled anonymous posting.
If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting