Grokking is (more or less) solved
Oct. 16th, 2022 10:38 am![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
The most interesting conceptual AI advances seem lately to come from "prosaic alignment" start-ups. These are companies which believe that the current trend of improving Transformer models is likely to lead straight to AGI, and that better understanding of the nature and properties of these model is key to AI safety (and, of course, it's also key to better AI capabilities).
And it is often the case that the key elements of work are done by people "on the edge", "in the penumbra" of those alignment start-ups.
In the previous post I mentioned the key new understanding of large Transformer models as simulators. That work has been done "while at Conjecture", but is not listed as directly coming from Conjecture (one of those "prosaic alignment" start-ups). I think the key people involved are still at Conjecture, but they seem to be trying to keep some distance between Conjecture and this work. I am continuing to take notes of those materials and commit them to GitHub (see links in the comments to the previous post).
Here is another one of those stories. Grokking is a phenomenon, where small Transformers look at a part of a mathematical structure for quite a while, and then rather suddenly transition to understanding the whole of that mathematical structure including the part they never see in training. It has been discovered in 2021 and has been a subject of a number of follow-up attempts to understand it.
The recent breakthrough has been done in mid-August by Neel Nanda who left Anthropic (perhaps the most famous of the "prosaic alignment" start-ups) a few months ago. And it looks like he has more or less solved the mysteries behind this phenomenon. I am going to continue studying his writings more. The links are in the comments.
And it is often the case that the key elements of work are done by people "on the edge", "in the penumbra" of those alignment start-ups.
In the previous post I mentioned the key new understanding of large Transformer models as simulators. That work has been done "while at Conjecture", but is not listed as directly coming from Conjecture (one of those "prosaic alignment" start-ups). I think the key people involved are still at Conjecture, but they seem to be trying to keep some distance between Conjecture and this work. I am continuing to take notes of those materials and commit them to GitHub (see links in the comments to the previous post).
Here is another one of those stories. Grokking is a phenomenon, where small Transformers look at a part of a mathematical structure for quite a while, and then rather suddenly transition to understanding the whole of that mathematical structure including the part they never see in training. It has been discovered in 2021 and has been a subject of a number of follow-up attempts to understand it.
The recent breakthrough has been done in mid-August by Neel Nanda who left Anthropic (perhaps the most famous of the "prosaic alignment" start-ups) a few months ago. And it looks like he has more or less solved the mysteries behind this phenomenon. I am going to continue studying his writings more. The links are in the comments.
no subject
Date: 2022-10-16 02:56 pm (UTC)https://twitter.com/NeelNanda5/status/1559060507524403200
https://www.neelnanda.io/blog/interlude-a-mechanistic-interpretability-analysis-of-grokking
no subject
Date: 2022-10-16 02:56 pm (UTC)https://github.com/anhinga/2022-notes/tree/main/Grokking
no subject
Date: 2022-10-16 03:00 pm (UTC)His take on the famous Anthropic paper he was involved in, "A Mathematical Framework for Transformer Circuits":
https://twitter.com/NeelNanda5/status/1580782930304978944
no subject
Date: 2022-10-16 06:39 pm (UTC)OMG, is it for real?! Amazing.
no subject
Date: 2022-10-17 02:03 pm (UTC)https://www.alignmentforum.org/posts/N6WM6hs7RQMKDhYjB/a-mechanistic-interpretability-analysis-of-grokking
Очень, очень интересно...
*******
Но, в целом, народ в этой области по-прежнему вполне слеп и глух; вот эта замечательная и, вроде бы, очень известная работа Anthropic прошлого года:
https://transformer-circuits.pub/2021/framework/index.html
совсем мало всё это цитируется:
https://scholar.google.com/citations?user=GLnX3MkAAAAJ&hl=en&oi=ao
Ну, может, дойдёт со временем... (Но, пока что, те, до кого дойдёт раньше, чем до толпы, будут иметь большое преимущество.)