And another frequent motif is that these things are good with fixing the weirdness of tokenizers
2:01:00 and for more complicated models, it is useful to think that attention heads are doing a lot of skip trigrams and doing other things on top of that
no subject
And another frequent motif is that these things are good with fixing the weirdness of tokenizers
2:01:00 and for more complicated models, it is useful to think that attention heads are doing a lot of skip trigrams and doing other things on top of that