![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
"WorldCoder, a Model-Based LLM Agent: BuildingWorld Models by Writing Code and Interacting with the Environment", arxiv.org/abs/2402.12275
Not a widely known paper (the authors don't promote it), but pretty spectacular (a friend of mine said, "Is it AGI already?").
I think I mostly understand how this works and I made some notes yesterday.
A meta-note here: GPT-4-level models mostly understand what they are doing, but are unreliable; so the question is, can one organize a process which reliably produces needed results based on that. There are plenty of papers trying to push in this direction, but this one is very elegant, and the results are quite good.
******
www.lesswrong.com/posts/jGuXSZgv6qfdhMCuJ/refusal-in-llms-is-mediated-by-a-single-direction - very elegant and simple
******
Not a widely known paper (the authors don't promote it), but pretty spectacular (a friend of mine said, "Is it AGI already?").
I think I mostly understand how this works and I made some notes yesterday.
A meta-note here: GPT-4-level models mostly understand what they are doing, but are unreliable; so the question is, can one organize a process which reliably produces needed results based on that. There are plenty of papers trying to push in this direction, but this one is very elegant, and the results are quite good.
******
www.lesswrong.com/posts/jGuXSZgv6qfdhMCuJ/refusal-in-llms-is-mediated-by-a-single-direction - very elegant and simple
******
May 9, 2024 update: Since this is access-list-only at the moment (although this post is likely to become public eventually), it's a good place for my notes on switching to Twitter "X Premium" experience (in comments).
May 13: let's move this post to being public.
no subject
Date: 2024-04-27 01:53 pm (UTC)no subject
Date: 2024-05-09 03:27 am (UTC)no subject
Date: 2024-05-10 12:38 am (UTC)The "verification" is still in progress.
Grok is useful (can talk to an AI in the Twitter context).
Another very useful thing is the ability to see "related posts" to a given tweet. This goes back in time and shows really interesting things.
no subject
Date: 2024-05-10 12:41 am (UTC)no subject
Date: 2024-05-10 12:49 am (UTC)(One could try to see manually whether this would work without paying for at least "X Basic" premium tier.)
EDIT: tested, this does not work from free accounts at the moment ("early access to new features").
no subject
Date: 2024-05-16 03:40 am (UTC)no subject
Date: 2024-05-13 12:03 pm (UTC)11am Boston time, Kolmogorov-Arnold networks talk: https://twitter.com/HannesStaerk/status/1789293551211426133
1pm Boston time, "Spring updates" from OpenAI (presumably a YouTube livestream on their channel)