dmm | Let's understand Large Language Models better

Entry tags:

large language models,
transformers,
understanding internals of ai

Let's understand Large Language Models better

This is a good starting point:

"A Mathematical Framework for Transformer Circuits", Dec 2021
transformer-circuits.pub/2021/framework/index.html

Flat | Top-Level Comments Only

Lots of copying though; it's a frequent motif

And another frequent motif is that these things are good with fixing the weirdness of tokenizers

2:01:00 and for more complicated models, it is useful to think that attention heads are doing a lot of skip trigrams and doing other things on top of that

Edited 2023-10-30 16:02 (UTC)

2:08:00 In addition to what they are saying about positive eigenvalues being much weaker than e.g. Adam Nemecek's paper is hoping for, here Neel Nanda is saying that even this does not really generalize to larger models

27 comments
Post a new comment

Flat | Top-Level Comments Only

Let's understand Large Language Models better

no subject

no subject