dmm: (Default)
Dataflow matrix machines (by Anhinga anhinga) ([personal profile] dmm) wrote2023-09-14 11:30 pm

6 months since GPT-4 release

A good way to mark this occasion is to try to read a new paper which seems to be a major breakthrough in understanding and harnessing the magic of Transformers:

"Uncovering mesa-optimization algorithms in Transformers"

"we demonstrate that minimizing a generic autoregressive loss gives rise to a subsidiary gradient-based optimization algorithm running inside the forward pass of a Transformer. This phenomenon has been recently termed mesa-optimization"
 
"Moreover, we find that the resulting mesa-optimization algorithms exhibit in-context few-shot learning capabilities,
independently of model scale. Our results therefore complement previous reports characterizing the
emergence of few-shot learning in large-scale LLMs"

 
timelets: (Default)

[personal profile] timelets 2023-09-15 04:32 am (UTC)(link)
Thanks
juan_gandhi: (Default)

[personal profile] juan_gandhi 2023-09-15 09:06 am (UTC)(link)

Who would have guessed before that gradient descent wold be strangely useful in doing "AI".

juan_gandhi: (Default)

[personal profile] juan_gandhi 2023-09-15 01:58 pm (UTC)(link)

This is amazing.