dmm | MSML21: Mathematical and Scientific Machine Learning

Aug 16-19.

first keynote - he says that Hopfield networks are spin glasses

and talks about phase transitions (but generally speaking the talk is not well-structured and difficult to understand)

spends all the time on old history which is unapplicable in practice, crams recent developments into very short time at the end... still might be good for links and pointers...

(and the field really does not know how to analyze cases of interest, namely N^2 weights; in my case, I like using less weights, so who knows, perhaps this might be applicable to my favorite approaches)

Edited 2021-08-16 14:05 (UTC)

What's important is that session links point to in-session schedules, and there one can find abstracts, and slides, and such... The structure of the conference site is a bit more hierarchical than usual, but one can find everything that is going on...

Whether it's so fruitful to follow everything, I am not sure...

In Session 1:

A Qualitative Study of the Dynamic Behavior for Adaptive Gradient Algorithms, Chao Ma (Princeton University), Lei Wu (Princeton University), Weinan E (Princeton University)

Paper Highlight, by Pankaj Mehta

The paper connects the continue-time limits of adaptive gradient descent methods, RMSProp and Adam, to the sign gradient descent algorithm and explores three types of typical phenomena in these adaptive algorithms’ training processes. By analyzing the signGD flow, this paper explains the fast initial convergence of these adaptive gradient algorithms with a learning rate approximating 0 and fixed momentum parameters. The connection, the convergence analysis, and experiments on verifying the three qualitative patterns are original and technically sound.

sessions 2 and 3 are interesting, titles in session 4 seem to be pretty interesting too...

Edited 2021-08-16 15:19 (UTC)

Session 3. Discussion of "Neural Collapse":

Prevalence of Neural Collapse during the terminal phase of deep learning training:

https://arxiv.org/abs/2008.08186

(Towards better understanding of "interpolation mode")

Computational Mathematics Workshop today: https://msml21.github.io/workshop_math/

I am attending this one.

Computational Physics Workshop tomorrow which I will skip...

First talk: discretize after you have an architecture for finding solutions, not before

This is a very good talk, deep material on "neural networks over functional spaces", good to know this...

Edited 2021-08-17 13:19 (UTC)

Videos start to appear on "zoom share", and not on a good public channel like youtube. Password for them is posted to that Slack.

Second talk: moderate data - supervised learning

more difficult situations: PINN - physics-informed neural networks

The talk is so-so

Edited 2021-08-17 15:04 (UTC)

Third talk looks good so far...

"The Modern Mathematics of Deep Learning": https://arxiv.org/abs/2105.04026

Edited 2021-08-17 15:33 (UTC)

Talk 4 - very interesting

Replacing rules for coarse-grained evolution so that it is precise(that is, equal to coarse-graining of the high-res simulation)

Trained from high-res simulations of small pieces

"Put in as little machine learning as possible and see how far we can get"

He likes doing things in JAX: https://github.com/google/jax-cfd

Multi-step loss is essential

"Some of the problems mentioned at the beginning are still intractable, but every 1000x acceleration helps ;-) " - I love this way to formulate it

Edited 2021-08-17 16:24 (UTC)

I skipped Computational Physics Workshop yesterday, and now is the last day.

Keynote. He speaks wells, but very slow to start talking about anything interesting at all...

"On the Curse of Memory in Recurrent Neural Networks: Approximation and Optimization Analysis": https://arxiv.org/abs/2009.07799

OnsagerNet

Edited 2021-08-19 14:02 (UTC)

"Inverse problems": https://msml21.github.io/session5/

Interpretable and Learnable Super-Resolution Time-Frequency Representation, Randall Balestriero (Rice University), Herve Glotin (); Richard Baraniuk (Rice University)

Paper Highlight, by Dennis Elbrachter

The paper introduces a method of obtaining super-resolved quadratic time-frequency representations via Gaussian filtering of the Wigner-Ville transform. It is both interpretable as well as computationally feasible, achieving state-of-the-art results on various datasets. I particularly enjoyed the clean presentation of formal results augmented by helpful explanations of the intuitions behind them.

https://en.wikipedia.org/wiki/Chirplet_transform

Phase Retrieval with Holography and Untrained Priors: Tackling the Challenges of Low-Photon Nanoscale Imaging, Hannah Lawrence (Flatiron Institute); David Barmherzig (); Henry Li (Yale); Michael Eickenberg (UC Berkeley); Marylou Gabrié (NYU / Flatiron Institute)

Paper Highlight, by Reinhard Heckel

The paper introduces a novel dataset-free deep learning framework for holographic phase retrieval. It shows, in a realistic simulation setups, that un-trained neural network enable to regularize holographic phase retrieval. It thus shows that non-linear inverse problems can be regularized with neural networks without any training, thereby making an important contribution in the intersection of machine learning and inverse problems.

Edited 2021-08-19 14:43 (UTC)

Session 6: Reinforcement Learning and Control (I'll skip this one)

PDEs and ODEs: https://msml21.github.io/session7/

last two talks on "committor functions"

A semigroup method for high dimensional committor functions based on neural network, Haoya Li (Stanford University), Yuehaw Khoo (U Chicago); Yinuo Ren (Peking University); Lexing Ying (Stanford University)

Paper highlight, by Jiequn Han

This paper proposes a new method based on neural networks to compute the high-dimensional committor functions. Understanding transition dynamics from the commitor function is a fundamental problem in statistical mechanics with decades of work behind it. Traditional numerical methods have an intrinsic limitation in solving general high-dimensional commitor functions. Algorithms based on neural networks have received much interest in the community, all based on the Fokker-Planck equation’s variational form. This paper’s main innovation lies in proposing a new variational formulation (loss function) based on the differential operator’s semigroup. The new formulation does not contain any differential operator, and the authors explicitly derive the loss’s graidents used for the training. The gradients only involve the first-order derivatives of the neural networks, in contrast to the second-order derivatives required in the previous methods. This feature is conceptually beneficial to the efficient training of neural networks. Numerical results on the standard testing examples and the Ginzburg-Landau model demonstrate the superiority of the proposed method. Besides, the authors also show that in the lazy training regime, the corresponding gradient flow converges at a geometric rate to a local minimum under certain assumptions.

Edited 2021-08-19 16:48 (UTC)

Computational Physics: https://msml21.github.io/session8/

"Holomorphic networks" - with complex numbers

"Implicit Form Neural Networks (IFNN)' - unlike PINN they don't require autodiff (good!), but are less generally applicable

Edited 2021-08-19 17:45 (UTC)

MSML21: Mathematical and Scientific Machine Learning

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject