dmm | "Narrow AGI" this year?

"Narrow AGI" is mostly an AGI-level artificial software engineer, an AGI-level artificial mathematician, an AGI-level artificial AI researcher (and probably a single entity combining these three application areas, because a strong AI researcher has to be a decent software engineer and a decent mathematician).

It seems that at least OpenAI (and, perhaps, other entities) should have this by the middle of 2025, if not earlier, at least for their internal use (assuming no major disasters, that is, assuming that San Fransisco Bay Area is intact, and AI companies continue functioning normally).

What do we know about the technical aspects? We see o1 performance (and can experience it directly), we see the claimed (and partially confirmed) numbers for the demo versions of o3 and o3-mini, in math and in software engineering. We know that the jump from o1 to o3 took about 3 months. Two more jumps like that would probably be sufficient (and one can add "scaffolding" on top of that).

Another thing we know is that Sam Altman sounds much more confident recently. I've come to these conclusions a number of days ago, but now it turns out that Sam's mood has also shifted in a similar fashion. I'll put some links in the comments.

Jan 19 update: Sam Altman will allegedly do a closed-door government briefing on Jan 30 (that's apparently is not a very big secret and has been leaked; the main topic is presumably as follows: many people in the leading AI labs have approximately the same degree of techno-optimism as I have myself, and so their timelines are tentatively quite short). www.axios.com/2025/01/19/ai-superagent-openai-meta

Flat | Top-Level Comments Only

From:

dmm

Now the numbers, from https://www.lesswrong.com/posts/QHtd2ZQqnPAcknDiQ/o3-oh-my or from the linked Dec 20 video, and the related discussions.

https://x.com/__nmca__/status/1870170098989674833

SWE-bench verified, 71.7%. Very nice, a big jump in state-of-the-art (but also very clear that this is not an AGI level yet, the AGI level would be close to 100% on this one).

Codeforces Elo 2727, that's overwhelmingly good (within top 200 competitive coders in the world)

https://x.com/__nmca__/status/1870170112290107540

25% on the famous new FrontierMath test, which was completely unapproachable to AI models until now.

(Very nice scores on the famous Arc-AGI benchmark too, the human-level results are achieved (that was not possible until now as well).)

"Solid bars show pass@1 accuracy and the shaded region shows the performance of majority vote (consensus) with 64 samples."

https://www.interconnects.ai/p/openais-o3-the-2024-finale-of-ai

(This might be a better technical link, actually.)

Edited Date: 2025-01-07 03:04 am (UTC)

Dataflow matrix machines (by Anhinga anhinga)

"Narrow AGI" this year?

"Narrow AGI" this year?

no subject

no subject

Profile

September 2025

Most Popular Tags

Page Summary

Active Entries

Style Credit

Expand Cut Tags