dmm | "Towards Categorical Foundations of Learning"

When one tries to use category theory for the applied work, a number of questions arise: Is it just too difficult to be used at all by me given my level of technical skills? Is it fruitful enough, and is the fruitfulness/efforts ratio high enough for all this to make sense?

I recently discovered Bruno Gavranović, a graduate student in Glasgow, whose work is promising in this sense. They are really trying hard to keep things simple and also trying to make sure that there are non-trivial applications. Here is one of his essays and papers (March 2021, so it's not the most recent one, but probably the most central):

www.brunogavranovic.com/posts/2021-03-03-Towards-Categorical-Foundations-Of-Neural-Networks.html

(I am posting this here because there are people who read this blog who are interested in applied category theory and like it, not because I am trying to convince those who formed a negative opinion of this subject. I am non-committal myself, I have not decided whether applied categories have strong enough fruitfulness/efforts ratio, but this particular entry seems to be one of the best shots in this sense, so I am going to try to go deeper with their work.)

Update: their collection of papers in the intersection between Category Theory and Machine Learning: github.com/bgavran/Category_Theory_Machine_Learning

Flat | Top-Level Comments Only

My initial note to myself a couple of days ago:

Discovered Bruno Gavranović: https://twitter.com/bgavran3

https://twitter.com/bgavran3/status/1599185403579609088 https://arxiv.org/abs/2212.00542

https://twitter.com/bgavran3/status/1478901780994007044 https://arxiv.org/abs/2103.01931

https://scholar.google.com/citations?user=ofP7CgYAAAAJ

https://www.brunogavranovic.com/

Via https://twitter.com/bgavran3/status/1599474366148149248 replying to https://twitter.com/michael_nielsen/status/1599472810271059968

And my follow-up to myself today (then I decided it's probably time to make a post):

Looked at the new https://www.brunogavranovic.com/posts/2022-12-05-graph_neural_networks_as_parametric_cokleisli_morphisms.html

Yes, the whole framework is actually not difficult(!) and makes sense. The question is: is it useful?

"This paper makes a step forward in substantiating our existing framework described in Categorical Foundations of Gradient-Based Learning. If you’re not familiar with this existing work - it’s a general framework for modeling neural networks in the language of category theory. Given some base category with enough structure, it describes how to construct another category where morphisms are parametric, and bidirectional.

Even more specifically - it allows us to describe the setting where the information being sent backwards is the derivative of some chosen loss function.

This is powerful enough to encompass a variety of neural network architectures - recurrent, convolutional, autoregressive, and so on. What the framework doesn’t do is describe the structural essence of all these architectures at the level of category theory.

Our new paper does that, for one specific architecture: Graph Convolutional Neural Networks (GCNNs). We show that they arise as a morphisms for a particular choice of the base category - the CoKleisli category of the product comonad."

Yes, actually, this looks really good: https://www.brunogavranovic.com/posts/2021-03-03-Towards-Categorical-Foundations-Of-Neural-Networks.html

Meanwhile, here is a gif with weird perception/cognitive effects (not necessarily safe, if one stares at it for long): https://twitter.com/TatsuyaBot/status/1599979177213857792

https://twitter.com/bgavran3/status/1335560269511204866

'Category theory feels subtractive, as opposed to additive. These days, there's more papers than ever and it seems like our main issue isn't producing new research, but understanding how new results interact with the immense body of already existing ones.

In my work, I feel like my primary goal isn't to add more complexity, but to reduce it by saying "hey I took half of the stuff out of your paper and it still works". Or "hey folks, see these three completely distinct papers? Actually, there's just one idea behind them all".'

'Deep Learning is notoriously an ad-hoc field. Despite its tremendous success, we lack a unifying perspective for this growing body of work. We have entire paradigms of how to effectively learn, but it’s still hard to precisely state what a neural network is and cover all the use cases. This is the contribution of our paper. We’re making a step forward in that regard by creating a foundation of neural networks terms of three things: 1) Parameterized maps, 2) Bidirectional data structures (Lenses/Optics) and 3) Reverse derivative categories.

Since our work is based on category theory, you might wonder the aforementioned concepts are, what Category theory even is, or even why you would want to abstract away some details in neural networks? This is a question that deserves a proper answer. For now I’ll just say that our paper really answers the following question in a very precise way: “What is the minimal structure, in some suitable sense, that you need to have to perform learning?”. This is certainly valuable. Why? If you try answering that question you might discover, just as we did, that this structure ends up encapsulated some strange types of learning, with hints to even meta-learning. For instance, after defining our framework on neural networks on Euclidean spaces we realized that it includes learning not just in Euclidean spaces, but also on Boolean circuits. This is pretty strange, how can you “differentiate” a Boolean circuit? It turns out you can, and this falls under the same framework of Reverse derivative categories.

Another thing we discovered is that all the optimizers (standard gradient descent, momentum, Nesterov momentum, Adagrad, Adam etc.) are the same kind of structure neural networks themselves are - giving us hints that optimizers are in some sense “hardwired meta-learners”, just as Learning to Learn by Gradient Descent by Gradient Descent describes.

Of course, I still didn’t tell you what this framework is, nor did I tell you how we defined neural networks. I’ll do that briefly now.'

(Of course, anyone who looked inside something like an ADAM optimizer knows that it's a compact neural-like machine of a non-standard architecture, and so various metalearning things can be done with it.

I looked at it earlier this year, because I had to rewrite one in order for it to work on Tree-like structures.

Here is my rather crude and non-idiomatic rewrite:
https://github.com/anhinga/julia-flux-drafts/blob/main/arxiv-1606-09470-section3/May-August-2022/v0-1/TreeADAM.jl
)

"These parameterised optics can be plugged in in sequence, in parallel, and it even turns out they form something called a topos - allowing us to talk about the “internal language” of a learner, a deeply exciting prospect. They model not just neural networks, but agents found in game theory as well. And as mentioned before, these optimizers which update the parameters of these learners are exactly of the shape of these learners themselves - opening up interesting questions about meta-learning."

And this topos is described here: "Learners' Languages" by David Spivak, https://arxiv.org/abs/2103.01189

That paper by Spivak says:

"Example 3.6 (Gradient descent). The gradient descent, backpropagation algorithm used by each “neu-
ron” in a deep learning architecture can be phrased as a logical proposition about learners. The whole
learning architecture is then put together as in [9], or as we’ve explained things above, using the operad
Sys from Definition 2.19"

...

"The logical propositions that come from Proposition 3.5 are very special. More generally, one could
have a logical proposition like “whenever I receive two red tokens within three seconds, I will wait five
seconds and then send either three blue tokens or two blues and six reds.” As long as this behavior has
the “whenever” flavor—more precisely as long as it satisfies the condition in Definition 3.4—it will be a
logical proposition in the topos."

Collection of papers he maintains:

https://github.com/bgavran/Category_Theory_Machine_Learning

"Towards Categorical Foundations of Learning"

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject