I am reading this paper: arxiv.org/abs/2104.04657
"In this paper, we introduce a new type of generalized neural network where neurons and synapses maintain multiple states. We show that classical gradient-based backpropagation in neural networks can be seen as a special case of a two-state network where one state is used for activations and another for gradients, with update rules derived from the chain rule. In our generalized framework, networks have neither explicit notion of nor ever receive gradients. The synapses and neurons are updated using a bidirectional Hebb-style update rule parameterized by a shared low-dimensional "genome". We show that such genomes can be meta-learned from scratch, using either conventional optimization techniques, or evolutionary strategies, such as CMA-ES. Resulting update rules generalize to unseen tasks and train faster than gradient descent based optimizers for several standard computer vision and synthetic tasks."
"In this paper, we introduce a new type of generalized neural network where neurons and synapses maintain multiple states. We show that classical gradient-based backpropagation in neural networks can be seen as a special case of a two-state network where one state is used for activations and another for gradients, with update rules derived from the chain rule. In our generalized framework, networks have neither explicit notion of nor ever receive gradients. The synapses and neurons are updated using a bidirectional Hebb-style update rule parameterized by a shared low-dimensional "genome". We show that such genomes can be meta-learned from scratch, using either conventional optimization techniques, or evolutionary strategies, such as CMA-ES. Resulting update rules generalize to unseen tasks and train faster than gradient descent based optimizers for several standard computer vision and synthetic tasks."
no subject
Date: 2021-04-26 07:45 pm (UTC)"Kirsch & Schmidhuber (2020) propose a generalized learning
algorithm based on a sparsely connected set of RNNs
that, similar to our framework, does not use any gradients
or explicit loss function, yet is able to approximate forward
pass and backpropagation solely from forward activations
of RNNs. Our system, in contrast, does not use RNN activations
and explicitly leaves (meta-parametrized) bidirectional
update rules in place."
It's an important piece of homework to compare details of similarities and differences between this paper and Kirsch & Schmidhuber, "Meta Learning Backpropagation And Improving It", https://arxiv.org/abs/2012.14905 (especially, given that the approach by Kirsch & Schmidhuber is somewhat related to the metalearning approach we are proposing to do in DMMs: page 2, section 2.2 and page 3, section A.3 of https://www.cs.brandeis.edu/~bukatin/towards-practical-dmms.pdf )
no subject
Date: 2021-04-26 08:01 pm (UTC)No traces of the source code (I'd like to understand their "unroll" better, but no code is shared, and I am not 100% sure about their "unroll", which plays quite a bit of role, but is not explained well).