dmm | Grokking is (more or less) solved

The most interesting conceptual AI advances seem lately to come from "prosaic alignment" start-ups. These are companies which believe that the current trend of improving Transformer models is likely to lead straight to AGI, and that better understanding of the nature and properties of these model is key to AI safety (and, of course, it's also key to better AI capabilities).

And it is often the case that the key elements of work are done by people "on the edge", "in the penumbra" of those alignment start-ups.

In the previous post I mentioned the key new understanding of large Transformer models as simulators. That work has been done "while at Conjecture", but is not listed as directly coming from Conjecture (one of those "prosaic alignment" start-ups). I think the key people involved are still at Conjecture, but they seem to be trying to keep some distance between Conjecture and this work. I am continuing to take notes of those materials and commit them to GitHub (see links in the comments to the previous post).

Here is another one of those stories. Grokking is a phenomenon, where small Transformers look at a part of a mathematical structure for quite a while, and then rather suddenly transition to understanding the whole of that mathematical structure including the part they never see in training. It has been discovered in 2021 and has been a subject of a number of follow-up attempts to understand it.

The recent breakthrough has been done in mid-August by Neel Nanda who left Anthropic (perhaps the most famous of the "prosaic alignment" start-ups) a few months ago. And it looks like he has more or less solved the mysteries behind this phenomenon. I am going to continue studying his writings more. The links are in the comments.

Flat | Top-Level Comments Only

From:

dmm

The key twitter thread and the key write-up:

https://twitter.com/NeelNanda5/status/1559060507524403200

https://www.neelnanda.io/blog/interlude-a-mechanistic-interpretability-analysis-of-grokking

My collection of Grokking-related literature:

https://github.com/anhinga/2022-notes/tree/main/Grokking

Other interesting things by Neel Nanda:

His take on the famous Anthropic paper he was involved in, "A Mathematical Framework for Transformer Circuits":

https://twitter.com/NeelNanda5/status/1580782930304978944

juan_gandhi

OMG, is it for real?! Amazing.

Чем дальше читаю, тем больше там интересных деталей (фазовые переходы, и как процесс тренировки самостоятельно переоткрывает преобразование Фурье, и всякое такое):

https://www.alignmentforum.org/posts/N6WM6hs7RQMKDhYjB/a-mechanistic-interpretability-analysis-of-grokking

Очень, очень интересно...

*******

Но, в целом, народ в этой области по-прежнему вполне слеп и глух; вот эта замечательная и, вроде бы, очень известная работа Anthropic прошлого года:

https://transformer-circuits.pub/2021/framework/index.html

совсем мало всё это цитируется:

https://scholar.google.com/citations?user=GLnX3MkAAAAJ&hl=en&oi=ao

Ну, может, дойдёт со временем... (Но, пока что, те, до кого дойдёт раньше, чем до толпы, будут иметь большое преимущество.)

S	M	T	W	T	F	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Dataflow matrix machines (by Anhinga anhinga)

Grokking is (more or less) solved

Grokking is (more or less) solved

no subject

no subject

no subject

no subject

no subject

Profile

May 2025

Most Popular Tags

Page Summary

Active Entries

Style Credit

Expand Cut Tags