dmm | Entries tagged with machine learning

www.lesswrong.com/posts/r2yTwkGt3kbQG2mXi/axrp-episode-19-mechanistic-interpretability-with-neel-nanda

I have been looking at a recent rather remarkable paper which includes the DeepDream creator among its authors, and I've decided to check whether I missed any of his works; and I turns out there is this paper I really should be aware of. This really resonates with some of the thing I have been exploring this year.

arxiv.org/abs/2007.00970

"We present a novel method for learning the weights of an artificial neural network - a Message Passing Learning Protocol (MPLP). In MPLP, we abstract every operations occurring in ANNs as independent agents. Each agent is responsible for ingesting incoming multidimensional messages from other agents, updating its internal state, and generating multidimensional messages to be passed on to neighbouring agents. We demonstrate the viability of MPLP as opposed to traditional gradient-based approaches on simple feed-forward neural networks, and present a framework capable of generalizing to non-traditional neural network architectures. MPLP is meta learned using end-to-end gradient-based meta-optimisation. We further discuss the observed properties of MPLP and hypothesize its applicability on various fields of deep learning."

When one tries to use category theory for the applied work, a number of questions arise: Is it just too difficult to be used at all by me given my level of technical skills? Is it fruitful enough, and is the fruitfulness/efforts ratio high enough for all this to make sense?

I recently discovered Bruno Gavranović, a graduate student in Glasgow, whose work is promising in this sense. They are really trying hard to keep things simple and also trying to make sure that there are non-trivial applications. Here is one of his essays and papers (March 2021, so it's not the most recent one, but probably the most central):

www.brunogavranovic.com/posts/2021-03-03-Towards-Categorical-Foundations-Of-Neural-Networks.html

(I am posting this here because there are people who read this blog who are interested in applied category theory and like it, not because I am trying to convince those who formed a negative opinion of this subject. I am non-committal myself, I have not decided whether applied categories have strong enough fruitfulness/efforts ratio, but this particular entry seems to be one of the best shots in this sense, so I am going to try to go deeper with their work.)

Update: their collection of papers in the intersection between Category Theory and Machine Learning: github.com/bgavran/Category_Theory_Machine_Learning

This week, Nov 17-18, Thu-Fri, 8am-11:45am Boston time, "Quantum physics and the first-person perspective": www.essentiafoundation.org/quantum-physics-and-the-first-person-perspective/seeing/

JuliaCon 2023, juliacon.org/2023/ the call for proposals is posted, deadline Dec 18: pretalx.com/juliacon2023/cfp

I've spent more quality time focusing of two breakthroughs in understanding the nature and the behavior of machine learning models which came from the "penumbra" of "prosaic alignment" start-ups and which I wrote about in my previous two posts.

"Grokking is (more or less) solved." I took brief notes between Oct 21 and Oct 23: github.com/anhinga/2022-notes/tree/main/Grokking-is-solved

"Generative autoregressive models are similators." I took extensive notes between Oct 5 and Oct 23: github.com/anhinga/2022-notes/tree/main/Generative-autoregressive-models-are-similators

I am continuing to develop thoughts related to these topics, I am going to gradually write more about those topics in the comments.

The most interesting conceptual AI advances seem lately to come from "prosaic alignment" start-ups. These are companies which believe that the current trend of improving Transformer models is likely to lead straight to AGI, and that better understanding of the nature and properties of these model is key to AI safety (and, of course, it's also key to better AI capabilities).

And it is often the case that the key elements of work are done by people "on the edge", "in the penumbra" of those alignment start-ups.

In the previous post I mentioned the key new understanding of large Transformer models as simulators. That work has been done "while at Conjecture", but is not listed as directly coming from Conjecture (one of those "prosaic alignment" start-ups). I think the key people involved are still at Conjecture, but they seem to be trying to keep some distance between Conjecture and this work. I am continuing to take notes of those materials and commit them to GitHub (see links in the comments to the previous post).

Here is another one of those stories. Grokking is a phenomenon, where small Transformers look at a part of a mathematical structure for quite a while, and then rather suddenly transition to understanding the whole of that mathematical structure including the part they never see in training. It has been discovered in 2021 and has been a subject of a number of follow-up attempts to understand it.

The recent breakthrough has been done in mid-August by Neel Nanda who left Anthropic (perhaps the most famous of the "prosaic alignment" start-ups) a few months ago. And it looks like he has more or less solved the mysteries behind this phenomenon. I am going to continue studying his writings more. The links are in the comments.

Вот, наконец, кажется возник правильный подход к пониманию природы моделей вроде GPT-3 и разнообразного волшебства, с этим связанного:

www.lesswrong.com/posts/vJFdjigzmcXMhNTsx/simulators

Он говорит, что надо перестать думать про эти модели в терминах более старых AI-систем.

Another important paper from one of François Fleuret's collaborations: arxiv.org/abs/2209.00588

Previous important papers include "Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention",arxiv.org/abs/2006.16236 and "Flatten the Curve: Efficiently Training Low-Curvature Neural Networks", arxiv.org/abs/2206.07144

github.com/salesforce/CodeGen

One can also run one of these models via HuggingFace; it is based on "A Conversational Paradigm for Program Synthesis" paper, arxiv.org/abs/2203.13474

Someone has even created a fake GitHub Copilot based on that (useful for those who prefer VSCode): github.com/moyix/fauxpilot

github.com/google/learned_optimization - "Meta-learning optimizers and more with JAX"

This is used by various interesting papers including the famous "persistent evolution strategies" paper which I don't understand and "Gradients are Not All You Need" arxiv.org/abs/2111.05803 tempting paper.

Moreover, it is used by a super-interesting "Practical tradeoffs between memory, compute, and performance in learned optimizers" arxiv.org/abs/2203.11860 must-read paper, which is being published at the following conference lifelong-ml.cc/ (Conference on Lifelong Learning Agents - CoLLAs 2022, Aug 18-24)

1st International Conference on Automated Machine Learning: automl.cc/

Follows ICML 2022 🇺🇦 icml.cc/ (one can attend virtually as well)

Neural Architecture Search is prominent and includes a competition: sites.google.com/view/zero-cost-nas-competition/home

The most notable keynote is by Jeff Clune, "AI-generating algorithms: the fastest path to AGI?"

kidger.site/thoughts/jax-vs-julia/

Various correspondencies between JAX and Julia constructions he is listing there are quite useful for people practicing either JAX or Julia.

(I am having good time with both JAX and Julia this year.)

For those of us (like myself) who'd like to experiment with changing Transformer architecture on a home personal computer.

Links are in the comments.