6 months since GPT-4 release
Sep. 14th, 2023 11:30 pmA good way to mark this occasion is to try to read a new paper which seems to be a major breakthrough in understanding and harnessing the magic of Transformers:
"Uncovering mesa-optimization algorithms in Transformers"
"Uncovering mesa-optimization algorithms in Transformers"
"we demonstrate that minimizing a generic autoregressive loss gives rise to a subsidiary gradient-based optimization algorithm running inside the forward pass of a Transformer. This phenomenon has been recently termed mesa-optimization"
"Moreover, we find that the resulting mesa-optimization algorithms exhibit in-context few-shot learning capabilities,
independently of model scale. Our results therefore complement previous reports characterizing the
emergence of few-shot learning in large-scale LLMs"
independently of model scale. Our results therefore complement previous reports characterizing the
emergence of few-shot learning in large-scale LLMs"
no subject
Date: 2023-10-21 01:22 am (UTC)It was presented a couple of weeks ago at ML Collective, and it has exactly one citing paper, namely "Uncovering mesa-optimization algorithms in Transformers".
Moreover, its artificial intention is a least-square solver (I don't know if it's the same as van Osvald's mesa-layer, but I'll try to figure that out).
One remark they made during the talk was that a randomly initialized untrained model with this kind of layer interpolated some data (e.g. sine curves) nicely...
***
Anyway, I'd like to resume studying this material, starting with this "artificial intention" paper, and making some notes here...
no subject
Date: 2023-10-21 01:24 am (UTC)no subject
Date: 2023-10-21 04:27 am (UTC)> One remark they made during the talk was that a randomly initialized untrained model with this kind of layer interpolated some data (e.g. sine curves) nicely...
Yes, this is Section 5.1, page 7
no subject
Date: 2023-10-21 04:31 am (UTC)"We would like to thank Irene, Teresa and Manuel for their
eternal patience, Pebbles for its unshakeable enthusiasm, the
Foxes for their distant support, Ketjow for nothing, Sorin
for making us look good and Matt for all the dancing."