dmm | (Reply)

From:

dmm

So, to explain the background better, we need to step back to that July Julia conference, where I understood what's cool about so-called "universal differential equations".

There is nothing "universal" about them, but the parts of the system of differential equations which are not well-known are replaced by modest feed-forward neural nets (not unlike the connectors in Transformers), and the word "universal" comes from the fact that feed-forward neural nets are universal approximators of large classes of functions, so if you don't know a function in a particular part of the right-hand-side well, it's cool to replace it with a feed-forward net, and to let the system figure it out, while also finding other parameters of the "neural differential equation" in question.

This way one gets nice compact models with strong biases from the structure of the system of differential equations.

Now, if one replaces "differential equation" with "arbitrary differentiable program" (note that occasional discontinuity and non-smoothness is allowed, as in e.g. ReLU or if), and a feed-forward net with a model or a piece of model one feels like including into one's differentiable program, then one obtains the "setup of differentiable programming".

So, neural nets, other machine learning models, differential equations, matrix multiplications, various pieces of DMMs, all can be included into "differentiable programs".

If one uses DMMs as a "pure formalism" for differentiable programming, the main advantage one gets from that is better metalearning.

In differentiable programming, full-scale metalearning includes program synthesis, and program synthesis is difficult. Here using DMMs as a formalism for differentiable programming should yield advantage.

But one does not have to do this all at once, one can start with a nice differentiable programming system like Julia Flux or JAX/Python and incorporate motives from DMMs into that piecemeal, and eventually consider gradually switching to a more pure DMM-based formalism with better metalearning properties.