On the essay on AI-generating algorithms
I am going to discuss the essay arxiv.org/abs/1905.10985 "AI-GAs: AI-generating algorithms, an alternate paradigm for producing general artificial intelligence" by Jeff Clune.
It is a rather involved text, although it does not have any formulas (22 pages + 12 pages of literature).
Jeff Clune got his PhD in 2010, founded Evolving AI Lab at the University of Wyoming and co-founded a start-up named Geometric Intelligence, which eventually became Uber AI Lab.
Together with Ken Stanley and other members of Uber AI Lab, he jump-started "deep neuroevolution" in the end of 2017: eng.uber.com/deep-neuroevolution/ (See also January 2019 review paper "Designing neural networks through neuroevolution" in Nature Machine Intelligence: www.nature.com/articles/s42256-018-0006-z ).
In January 2020, he joined OpenAI to lead a large-scale effort into research of AI-generating algorithms.
***
I am just going to write various quotes from that paper and discuss parts of this essay in the comments to this post now and in the coming days.
March 26 update: I wrote a follow-up essay, "Synergy between AI-generating algorithms and dataflow matrix machines", covering possible interplay between AI-GAs and DMMs: github.com/anhinga/2020-notes/tree/master/research-notes
It is a rather involved text, although it does not have any formulas (22 pages + 12 pages of literature).
Jeff Clune got his PhD in 2010, founded Evolving AI Lab at the University of Wyoming and co-founded a start-up named Geometric Intelligence, which eventually became Uber AI Lab.
Together with Ken Stanley and other members of Uber AI Lab, he jump-started "deep neuroevolution" in the end of 2017: eng.uber.com/deep-neuroevolution/ (See also January 2019 review paper "Designing neural networks through neuroevolution" in Nature Machine Intelligence: www.nature.com/articles/s42256-018-0006-z ).
In January 2020, he joined OpenAI to lead a large-scale effort into research of AI-generating algorithms.
***
I am just going to write various quotes from that paper and discuss parts of this essay in the comments to this post now and in the coming days.
March 26 update: I wrote a follow-up essay, "Synergy between AI-generating algorithms and dataflow matrix machines", covering possible interplay between AI-GAs and DMMs: github.com/anhinga/2020-notes/tree/master/research-notes
no subject
***
AI-GAs may prove to be the fastest path to general AI. However, even if they are not, they are worth pursuing anyway. It is intrinsically scientifically worthwhile to attempt to answer the question of how to create a set of simple conditions and a simple algorithm that can bootstrap itself from simplicity to produce general intelligence. Just such an event happened on Earth, where the extremely simple algorithm of Darwinian evolution ultimately produced human intelligence. Thus, one reason that creating AI-GAs is beneficial is that doing so would shed light on the origins of our own intelligence.
AI-GAs would also teach us about the origins of intelligence generally, including elsewhere in the universe. They could, for example, teach us which conditions are necessary, sufficient, and catalyzing for intelligence to arise. They could inform us as to how likely intelligence is to emerge when the sufficient conditions are present, and how often general intelligence emerges after narrow intelligence does.
Presumably different instantiations of AI-GAs (either different runs of the same AI-GA or different types of AI-GAs) would lead to different kinds of intelligence, including different, alien cultures. AI-GAs would likely produce a much wider diversity of intelligent beings than the manual path to creating AI, because the manual path would be limited by our imagination, scientific understanding, and creativity. We could even create AI-GAs that specifically attempt to create different types of general AI than have already been produced. AI-GAs would thus better allow us to study and understand the space of possible intelligences, shedding light on the different ways intelligent life might think about the world.
The creation of AI-GAs would enable us to perform the ultimate form of cultural travel, allowing us to interact with, and learn from, wildly different intelligent civilizations. Having a diversity of minds could also catalyze even more scientific discovery than the production of a single general AI would. We would also want to reverse engineer each of these minds for the same reasons we study neuroscience in animals, including humans: because we are curious and want to understand intelligence. We and others have been conducting such ‘AI Neuroscience’ for years now , but the work is just beginning. John Maynard Smith wrote the following about evolutionary systems: “So far, we have been able to study only one evolving system, and we cannot wait for interstellar flight to provide us with a second. If we want to discover generalizations about evolving systems, we will have to look at artificial ones.” The same is true regarding intelligence.
Additionally, explained in Section 2.3, the manner in which AI-GA is likely to produce intelligence will involve the creation of a wide diversity of learning environments, which could include novel virtual worlds, artifacts, riddles, puzzles, and other challenges that themselves will likely be intrinsically interesting and valuable. Many of those benefits will begin to accrue in the short-term with AI-GA research, so AI-GAs will provide short-term value even if the long-term goals remain elusive. AI-GAs are also worthwhile to pursue because they inform our attempts to create open-ended algorithms (those that endlessly innovate). As described in Section 1.2, we should also invest in AI-GA research because it might prove to be the fastest way to produce general AI.
For all these reasons, I argue that the creation of AI-GAs should be considered its own independent scientific grand challenge.
no subject
no subject
Thanks for mentioning the "Niche construction" - he mentions niches quite a bit in pages 11-16, but never references "Niche construction". I am going to include the Widipedia reference here for completeness:
https://en.wikipedia.org/wiki/Niche_construction
no subject
no subject
Of course, non-verbal stage is particularly tricky... But it all should be doable eventually.
no subject
no subject
***
Perhaps the most ambitious scientific quest in human history is the creation of general artificial intelligence, which roughly means AI that is as smart or smarter than humans. The dominant approach in the machine learning community is to attempt to discover each of the pieces required for intelligence, with the implicit assumption that some future group will complete the Herculean task of figuring out how to combine all of those pieces into a complex thinking machine. I call this the "manual AI approach".
This paper describes another exciting path that ultimately may be more successful at producing general AI. It is based on the clear trend in machine learning that hand-designed solutions eventually are replaced by more effective, learned solutions. The idea is to create an AI-generating algorithm (AI-GA), which automatically learns how to produce general AI. Three Pillars are essential for the approach: (1) meta-learning architectures, (2) meta-learning the learning algorithms themselves, and (3) generating effective learning environments. I argue that either approach could produce general AI first, and both are scientifically worthwhile irrespective of which is the fastest path.
Because both are promising, yet the ML community is currently committed to the manual approach, I argue that our community should increase its research investment in the AI-GA approach. To encourage such research, I describe promising work in each of the Three Pillars. I also discuss AI-GA-specific safety and ethical considerations.
Because it it may be the fastest path to general AI and because it is inherently scientifically interesting to understand the conditions in which a simple algorithm can produce general AI (as happened on Earth where Darwinian evolution produced human intelligence), I argue that the pursuit of AI-GAs should be considered a new grand challenge of computer science research.
no subject
***
"If the manual path wins, we would still be interested in creating AI-GAs for all the reasons just provided in Section 1.4. If the AI-GA path wins, we would still be interested in creating general AI via the manual path. That is because it is likely that an AI-GA would produce a complex machine whose inner workings we do not understand. As Richard Feynman said, “What I cannot create, I do not understand.” We understand by building, and one great way to understand intelligence is to be able to take it apart, rewire components, and learn how to put it together piece by piece. AI-GAs could actually catalyze the manual path because it would produce a large diversity of general AIs, perhaps making it easy for us to recognize what is necessary, variable, and incidental in the creation of intelligent machines. Additionally, each path would catalyze the other because the general AI produced by either could be used to accelerate scientific progress in the other path."
***
Speaking of the sentence I marked in boldface, I think we should do more to push towards building understandable machines on this path.
Pressing towards building more compact, understandable machines might also work as our guiding light towards saving computational resources (see my further comment on Section 1.6).
no subject
***
"There are a few main reasons why I think the AI-GA path is more likely. Initially, it scales well with the exponential compute we can expect in the years and decades to come. While AI-GAs will require extraordinary amounts of computation by today’s standards, they have the nice property that they will be able to consume that compute easily once it is available. As Richard Sutton has recently pointed out, history has shown that the algorithms that tend to win over the long haul are simple ones that can take advantage of massive amounts of computing [178]. Additionally, as noted above, history teaches us that learned pipelines eventually surpass ones that are entirely or partially handcreated. Further, the AI-GA path enables small teams to begin working on algorithms that might produce general AI right now. [...] Overall, it is likely that in the near to middle term, the manual path will produce higher-performing AI systems that are of more use to society, just as HOG and SIFT produced better computer vision systems in the pre-2012 era. However, just as with computer vision, it is likely that fully learned systems will eventually surpass and then rapidly far exceed the capabilities produced by the manual path."
***
Actually, one place where I don't quite agree with Jeff Clune is his certainty that "AI-GAs will require extraordinary amounts of computation by today’s standards". That might be the case, or not, depending on how successful the progress in the Three Pillars is. The name of the game is to reduce the amount of compute from what it took the computer named Earth and its Biosphere to produce something like us, to the amount of compute feasible to us at some point.
But whether we can only hope to cut the required computer time and memory to "extraordinary amounts of computation by today’s standards", or whether we can cut it more drastically and, perhaps, even to something already computationally feasible today is, I think, quite open and difficult to predict.
no subject
Section 2.2 "The Second Pillar: Meta-learning the learning algorithms" - a very important two-page overview.
Section 2.3 "The Third Pillar: Generating effective learning environments and training data" - technical core of the paper (pages 9-16), I am not planning to overview this core at the moment (I think Jeff Clune is actively working on this together with his group at OpenAI, and I am pondering trying to do more with the first two pillars in the near future (with some luck)).
no subject
***
Page 16:
"the AI-GA approach is not free of building blocks. It does not start from scratch, trying to create intelligence from nothing but the laws of physics and a soup of atoms. It might, for example, start with the assumption that we want neural networks with regular and neuromodulatory neurons, niches and explicit goal-switching, and populations of agents within each environmental niche. Research on AI-GAs will involve identifying the minimum number of sufficient and catalyzing building blocks to create an AI-GA. But the set of building blocks AI-GA researchers will look for will be very different from those for the manual engineering approach."
"Rather than discovering those building blocks manually, AI-GA researchers will try to figure out what are the building blocks that enable us to abstract open-ended complexity explosions in a computationally tractable way. I hypothesize that fewer building blocks need to be identified to produce an AI-GA than to produce intelligence via the manual AI approach. For that reason, I further hypothesize that identifying them and how to combine them will be easier, and thus that AI-GA is more likely to produce general AI faster. That said, an open question is how much compute is required to make an AI-GA."
no subject
"It may turn out that the manual path can succeed without having to identify hundreds of building blocks. It might turn out instead that only a few are required. At various points in the history of physics many pieces of theoretical machinery were thought to be required, only later to be unified into smaller, simpler, more elegant frameworks. Such unification could also occur within the manual AI path. But at some point if everything is being learned from a few basic principles, it seems more like an AI-GA approach instead of a manual identify-then-combine engineering approach. In other words, the AI-GA approach is the quest for a set of simple building blocks that can be combined to produce general AI, so true unification would validate it."
"One interesting way to catalyze the Third Pillar is to harness data from the natural world when creating effective learning environments. This idea could accelerate progress in both the target-task and/or open-ended versions of the Third Pillar. For example, the environment generator could be given access to the Internet.[...]"
"There is a third path to general AI. I call it the “mimic path.” It involves neuroscientists studying animal brains, especially the human brain, in an attempt to reverse engineer how intelligence works in animals and rebuild it computationally.[...] Overall, I do not view the mimic path it as one of the major paths to producing general AI because the goal of those committed to it is not solely to produce general AI. I thus only mention it this late in this essay, and still consider the manual and AI-GA paths as the two main paths."
"For the most part, this article has assumed that neural networks have a large part to play in the creation of AI-GAs. That reflects my optimism regarding deep neural networks, as well as my experience and passions. That said, it is of course possible that other techniquesmay produce general AI. One could substitute other techniques into the pillars of the AI-GA framework."
no subject
***
"I also want to emphasize that AI-GAs are not an “evolutionary approach,” despite people gravitating towards calling it that despite my saying otherwise. There are a few reasons to avoid that terminology. A first reason is that many methods could serve as the outer-loop optimizer. For example, one could use gradient-based meta-learning via meta-gradients or policy gradients. Additionally, one could potentially use Bayesian Optimization as the outer-loop search algorithm, although innovations would be needed to help them search in high-dimensional search landscapes. Of course, evolutionary algorithms are also an option. There are pros and cons to using evolutionary methods, and benefits to hybrid approaches that combine evolutionary concepts (searching in the parameter space) with policy-gradient concepts (searching in the action space). Because they are just one of many choices, calling AI-GAs an evolutionary approach is inaccurate. A second reason to avoid calling this an evolutionary approach is that many people in the machine learning community seem to have concluded that evolutionary methods are not worthwhile and not worth considering. There is thus a risk that if the AI-GA idea is associated with evolution it will not be evaluated with a clear, objective, open-mind, but instead will be written off for reasons that do not relate to its merits."
"Of course, in practice the three different paths will not exist isolated from each other. The manual, mimic, and AI-GA paths will all inform each other as discoveries in one catalyze and inspire work in the other two. Additionally, people will pursue hybrid approaches (e.g. the vision outlined in Botvinick et al. [16]) and it will be difficult in many cases to tell which path a particular group or project belongs to. That said, it is instructive to give these different approaches their own names and talk about them separately despite the inevitable cross pollination and hybridization that will occur."
"Many researchers will enjoy research into creating AI-GAs. There are many advantages to AI-GA research versus the manual path. Initially, there are currently tens of thousands of researchers pursuing the manual path. There are very few scientists currently pursuing AI-GAs. For many, that will make AI-GA research more exciting. It also decreases the risk of getting scooped. Additionally, to pursue the manual path, one might feel (as I have historically) the need to stay abreast of developments on every building block one considers potentially important to creating a large, complicated general AI machine.[...] That is of course impossible given the number of such papers published each year. Because AI-GAs have fewer researchers and, if my hypothesis is correct, fewer building blocks required to make them, staying abreast of the AI-GA literature should prove more manageable. Additionally, if the large-scale trend in machine learning continues, wherein hand-coded components are replaced by learning, working on AI-GAs is a way to future-proof one’s research. To put it brashly, knowing what you know now, if you could go back 15 years ago, would you want to dedicate your career to improving HOG and SIFT? Or would you rather invest in the learning-based approaches that ultimately would render HOG and SIFT unnecessary, and prove to be far more general and powerful?"
"As AI-GA research advances, it will likely generate many innovations that can be ported to the manual path. It will also create techniques and narrow AIs of economic importance that will benefit society long before the ambitious goal of producing general AI is accomplished. For example, creating algorithms that automatically search for architectures, learning algorithms, and training environments that solve tasks will be greatly useful, even for tasks far less ambitious than creating general AI. Thus, even if creating an AI-GA ultimately proves impossible, research into it will be valuable."
no subject
"a commitment to learning is not a commitment to sample inefficient learners that only perform well with extreme levels of data and that generalize poorly. The AI-GA philosophy is that via a compute-intensive, sample-inefficient outer loop optimization process we can produce learning agents that are extremely sample efficient and that generalize well. Just as evolution (a slow, compute-intensive, sample-inefficient process) produced human intelligence, as AI-GAs advance they will produce individual learners that increasingly approach human learners in sample efficiency and generalization abilities. One might argue that means that the system itself is not sample efficient, because it requires so much data. That is true in some sense, but not true in other important ways. One important meaning of sample efficiency is how many samples are needed on a new problem. For example, a new disease might appear on Earth and we may want doctors or AI to be able to identify it and make predictions about it from very few labeled examples."
My comment: So, the question here is: how huge must the outer loop computation be? The answer is not obvious, and very much depends on us being inventive in all Three Pillars.
"The idea behind AI-GAs is that as compute becomes cheaper and our ability to generate sufficiently complex learning environments grows, we can afford to be sample inefficient in the outer loop in the service of producing a learner that is at or beyond human intelligence in being sample efficient when deployed on problems we care about. Putting aside its computational cost, AI-GAs thus in some sense represent the best of both ends of the spectrum: it can learn from data and not be constrained by human priors, but can produce something that, like humans themselves, contain powerful priors and ways to efficiently update them to rapidly solve problems."
no subject
***
"The question of whether and why we should create general AI is a complicated one and
is the focus of many articles [143, 41, 4, 18, 15]. I will not delve into that issue here as it is better served when it is the sole issue of focus. However, the AI-GA path introduces its own unique set of ethical issues that I do want to mention here."
[...]
"It is fair to ask why should I write this paper if I think AI-GA research is more dangerous, as I am attempting to inform people about it potentially being a faster path to general AI and advocating that more people work on this path. One reason is I believe that, on balance, technological advances produce more benefit than harm. That said, this technology is very different and could prove an exception to the rule. A second reason is because I think society is better off knowing about this path and its potential, including its risks and downsides. We might therefore be better prepared to maximize the positive consequences of the technology while working hard to minimize the risks and negative outcomes. Additionally, I find it hard to imagine that, if this is the fastest path to AI, then society will not pursue it. I struggle to think of powerful technologies humanity has not invented soon after it had the capability to do so. Thus, if it is inevitable, then we should be aware of the risks and begin organizing ourselves in a way to minimize those risks. Very intelligent people disagree with my conclusion to make knowledge of this technology public. I respect their opinions and have discussed this issue with them at length. It was not an easy decision for me to make. But ultimately I feel that it is a service to society to make these issues public rather than keep them the secret knowledge of a few experts."
"There is another ethical concern, although many will find it incredible and dismiss it as the realm of fantasy or science fiction. We do not know how physical matter such as atoms can produce feelings and sensations like pain, pleasure, or the taste of chocolate, which philosophers call qualia. While some disagree, I think we have no good reason to believe that qualia will not emerge at some point in artificially intelligent agents once they are complex enough. A simple thought experiment makes the point: imagine if the mimic path enabled us to simulate an entire human brain and body, down to each subatomic particle. It seems likely to me that such a simulation would feel the same sensations as its real-world counterpart."
"Recognizing if and when artificial agents are feeling pain, pleasure, and other qualia that are worthy of our ethical considerations is an important subject that we will have to come to terms with in the future. However, that issue is not specific to the method in which AI is produced, and therefore is not unique to the AI-GA path. There is an AI-GA-specific consideration on this front, however. On Earth, there has been untold amounts of suffering produced in animals en route to the production of general AI. Is it ethical to create algorithms in which such suffering occurs if it is essential, or helpful, to produce AI? Should we ban research into algorithms that create such suffering in order to focus energy on creating AI-GAs that do not involve suffering? How do we balance the benefits to humans and the planet of having general AI vs. the suffering of virtual agents? These are all questions we will have to deal with as research progresses on AI-GAs. They are related to the general question of ethics for artificial agents, but have unique dimensions worthy of specific consideration."
"Some of these ideas will seem fantastical to many researchers. In fact, it is risky for my career to raise them. However, I feel obligated to let society and our community know that I consider some of these seemingly fantastical outcomes possible enough to merit consideration. For example, even if there is a small chance that we create dangerous AI or untold suffering, the costs are so great that we should discuss that possibility. As an analogy, if there were a 1% chance that a civilization-ending asteroid could hit Earth in a decade or ten, we would be foolish not to begin discussing how to track it and prevent that catastrophe."
"We should keep in mind the grandeur of the task we are discussing, which is nothing short than the creation of an artificial intelligence smarter than humans. If we succeed, we arguably have also created life itself, by some definitions. We do not know if that intelligence will feel. We do not know what its values might be. We do not know what its intentions towards us may be. We might have an educated guess, but any student of history would recognize that it would be the height of hubris to assume we know with certainty exactly what general AI will be like. Thus, it is important to encourage, instead of silence, a discussion of the risks and ethical implications of creating general artificial intelligence."
no subject
***
[...]
"I also described an alternative path to AI: creating general AI-generating algorithms, or AI-GAs. This path involves Three Pillars: meta-learning architectures, meta-learning algorithms, and automatically generating effective learning environments. As with the other paths, there are advantages and disadvantages to this approach. A major con is that AI-GAs will require a lot of computation, and therefore may not be practical in time to be the first path to produce general AI. However, AI-GA’s ability to benefit more readily from exponential improvements in the availability of compute may mean that it surpasses the manual path before the manual path succeeds. A reason to believe that the AI-GA path may be the fastest to produce general AI is in line with the longstanding trend in machine learning that hand-coded solutions are ultimately surpassed by learning-based solutions as the availability of computation and data increase over time. Additionally, the AI-GA path may win because it does not require the Herculean Phase 2 of the manual path and all of its scientific, engineering, and sociological challenges. Additional benefits of AI-GA research are that fewer people are working on it, making it an exciting, unexplored research frontier."
"All three paths are worthwhile scientific grand challenges. That said, society should increase its investment in the AI-GA path. There are entire fields and billions of dollars devoted to the mimic path. Similarly, most of the machine learning community is pursuing the manual path, including billions of dollars in government and industry funding. Relative to these levels of investment, there is little research and investment in the AI-GA path. While still small relative to the manual path, there has been a recent surge of interest in Pillar 1 (meta-learning architectures) and Pillar 2 (metalearning algorithms). However, there is little work on Pillar 3, and no work to date on attempting to combine the Three Pillars. Since the AI-GA path might be the fastest path to producing general AI, then society should substantially increase its investment in AI-GA research. Even if one believes the AI-GA path has a 1%-5% of being the first to produce general AI, then we should allocate corresponding resources into the field to catalyze its progress. That, of course, assumes we conclude that the benefits of potentially producing general AI faster outweigh the risks of producing it via AI-GAs, which I ultimately do. At a minimum, I hope this paper motivates a discussion on these questions. While there is great uncertainty about which path will ultimately produce general AI first, I think there is little uncertainty that we are underinvesting in a promising area of machine learning research."
"Finally, this essay has discussed many of the interesting consequences of building general AI that are unique to producing general AI via AI-GAs. One benefit is being able to produce a large diversity of different types of intelligent beings, and thus accelerating our ability to understand intelligence in general and all its potential manifestations. Doing so may also better help us understand our own single instance of intelligence, much as traveling the world is necessary to truly understand one’s hometown. Each different intelligence produced by an AI-GA could also create entire alien histories and cultures from which we can learn from. Downsides unique to AI-GAs were also discussed, including that it might make the sudden, unanticipated production of AI more likely, that it might make producing dangerous forms of AI more likely, and that it may create untold suffering in virtual agents. While I offered my own views on these issues and how I weigh the positives and negatives of this technology for the purpose of deciding whether we should pursue it, a main goal of mine is to motivate others to discuss these important issues."
The last paragraph says:
"My overarching goal in this essay is not to argue that one path to general AI is likely to be better or faster. Instead, it is to highlight that there is an entirely different path to producing general AI that is rarely discussed. Because research in that path is less well known, I briefly summarized some of the research we and others have done to take steps towards creating AI-GAs. I also want to encourage reflection on (1) which path or paths each of us is committed to and why, (2) the assumptions that underlie each path (3) the reasons why each path might prove faster or slower in the production of general AI, (4) whether society and our community should rebalance our investment in the different paths, and (5) the unique benefits and detriments of each approach, including AI safety and ethics considerations. It is my hope that this essay will improve our collective understanding of the space of possible paths to producing general AI, which is worthwhile for everyone regardless of which path we choose to work on. I also hope this essay highlights that there is a relatively unexplored path that may turn out to be the fastest path in the greatest scientific quest in human history. I find that extremely exciting, and hope to inspire others in the community to join the ranks of those working on it."
скульптурно лепить классные штуки из DMMs
***
It seems that in the context of AI-GAs we would like to have more versatile neural machines. It might be useful to be able to easily express algorithms precisely within neural machines, rather than only to learn them approximately, to be able to use readable compact neural networks as well as the overparameterized ones, and to control the degree to which they are overparameterized, to be able to precisely express complicated hierarchical structures and graphs within neural networks, rather than only to model them, and to be able to have flexible self-modification capabilities, where one can take linear combinations and compositions of various self-modification operators and where one is not constrained by the fact that a neural net tends to have more weights than outputs.
It turns out that this can be achieved by a rather mild upgrade: instead of basing neural machines on streams of numbers, one could base them on arbitrary streams supporting the notion of combining several streams with coefficients ("linear combination"). Then one can support the key idea of neural computations, namely that linear and non-linear transformations should be interleaved, and at the same time achieve the wish list above.