![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
This is a good starting point:
"A Mathematical Framework for Transformer Circuits", Dec 2021
transformer-circuits.pub/2021/framework/index.html
"A Mathematical Framework for Transformer Circuits", Dec 2021
transformer-circuits.pub/2021/framework/index.html
no subject
Date: 2023-10-29 03:51 pm (UTC)~10:00 (anecdotally) an attempt to interpret small visual models on MNIST by Chris Olah did not work, but visual models became more interpretable when they got larger.
In Transformers, smaller models are easier to understand, but this is by no means obvious (says Neel Nanda in that lecture, but who knows how this would change eventually; in any case, the knowledge thus acquired does seem to be transferrable OK to larger models).
no subject
Date: 2023-10-29 05:29 pm (UTC)This is useless at inference, but this works great to parallelize training.
no subject
Date: 2023-10-29 05:51 pm (UTC)(So the bulk of computations are probably shallow, with a bit of "true deepness" sprinkled on top of it.)
no subject
Date: 2023-10-29 05:58 pm (UTC)(perhaps people who conjecture about "holographic storage" within residual stream are right, who knows; one can consider improving it in various ways: a) towards detangling, b) alternatively, towards better holography)
no subject
Date: 2023-10-29 06:07 pm (UTC)(but, actually, positions are meaningful, so there is still a bit of privileged structure in the residual stream, just (perhaps) not within the embedding vectors (but perhaps even there, if we look closely, who knows))
~37:50 spectrum of how privileged a basis is, rather than a binary privileged vs non-privileged
(the truth is there are traces of various privileges in the residual stream as well)
~39:30 even ADAM privileges everything it interacts with, because of its weirdness ("ADAM sucks" says Neel Nanda, but I don't think it's necessarily so, perhaps this artificial thing is good, who knows(!)).
no subject
Date: 2023-10-29 06:43 pm (UTC)that gives some crude proxy for what's going on
no subject
Date: 2023-10-29 06:49 pm (UTC)no subject
Date: 2023-10-29 07:01 pm (UTC)(but we need to see how this works with context length, it's not very transparent in the code, which is inconvenient; in MLP it is even less transparent than in the attention layer, where they have to write it explicitly in connection with splitting into attention heads)
no subject
Date: 2023-10-29 07:19 pm (UTC)no subject
Date: 2023-10-29 07:22 pm (UTC)