dmm: (0)
Dataflow matrix machines (by Anhinga anhinga) ([personal profile] dmm) wrote 2023-10-30 03:48 pm (UTC)

Lots of copying though; it's a frequent motif

And another frequent motif is that these things are good with fixing the weirdness of tokenizers

2:01:00 and for more complicated models, it is useful to think that attention heads are doing a lot of skip trigrams and doing other things on top of that

Post a comment in response:

This account has disabled anonymous posting.
If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting