"1:30:18 we think you're grown up enough that you can figure out where it's useful to look and we're going to give you some 1:30:25 fraction of your premises I think it's like articles to attention something like uh 1:30:32 one-sixth of the parameters of the Transformer go to attention and we're like 1:30:37 these parameters to figure out where you should be moving information from what 1:30:42 does an intelligent worrying and an intelligent convolution look like and as we'll see later with induction 1:30:48 heads there can actually be like a pretty sophisticated and intelligent amount of computation 1:30:53 that goes into what this smart Dynamic convolution 1:30:58 looks like but yeah fundamentally attention is a generalized convolution where we allow 1:31:05 Transformers to compute how they ought to be moving information around for themselves"
no subject
we think you're grown up enough that you can figure out where it's useful to look and we're going to give you some
1:30:25
fraction of your premises I think it's like articles to attention something like uh
1:30:32
one-sixth of the parameters of the Transformer go to attention and we're like
1:30:37
these parameters to figure out where you should be moving information from what
1:30:42
does an intelligent worrying and an intelligent convolution look like and as we'll see later with induction
1:30:48
heads there can actually be like a pretty sophisticated and intelligent amount of computation
1:30:53
that goes into what this smart Dynamic convolution
1:30:58
looks like but yeah fundamentally attention is a generalized convolution where we allow
1:31:05
Transformers to compute how they ought to be moving information around for themselves"