dmm | "Meta-Learning Bidirectional Update Rules"

You're viewing

dmm's journal
Create a Dreamwidth Account Learn More

Reload page in style: site light

I am reading this paper: arxiv.org/abs/2104.04657

"In this paper, we introduce a new type of generalized neural network where neurons and synapses maintain multiple states. We show that classical gradient-based backpropagation in neural networks can be seen as a special case of a two-state network where one state is used for activations and another for gradients, with update rules derived from the chain rule. In our generalized framework, networks have neither explicit notion of nor ever receive gradients. The synapses and neurons are updated using a bidirectional Hebb-style update rule parameterized by a shared low-dimensional "genome". We show that such genomes can be meta-learned from scratch, using either conventional optimization techniques, or evolutionary strategies, such as CMA-ES. Resulting update rules generalize to unseen tasks and train faster than gradient descent based optimizers for several standard computer vision and synthetic tasks."

Flat | Top-Level Comments Only

From:

dmm

page 1:

"We define a space of possible transformations that specify
the interaction between neurons’ feed-forward and feedback
signals. The matrices controlling these interactions
are meta-parameters that are shared across both layers and
tasks. We term these meta-parameters a “genome”. This
reframing opens up a new, more generalized space of neural
networks, allowing the introduction of arbitrary numbers
of states and channels into neurons and synapses, which
have their analogues in biological systems, such as the multiple
types of neurotransmitters, or chemical vs. electrical
synapse transmission.

Our framework, which we call BLUR (Bidirectional
Learned Update Rules) describes a general set of multi-state
update rules that are capable to train networks to learn new
tasks without ever having access to explicit gradients. We
demonstrate that through meta-learning BLUR can learn
effective genomes with just a few training tasks. Such
genomes can be learned using off-the-shelf optimizers or
evolutionary strategies. We show that such genomes can
train networks on unseen tasks faster than comparably sized
gradient networks. The learned genomes can also generalize
to architectures unseen during the meta-training.

From:

dmm

page 2:

"Kirsch & Schmidhuber (2020) propose a generalized learning
algorithm based on a sparsely connected set of RNNs
that, similar to our framework, does not use any gradients
or explicit loss function, yet is able to approximate forward
pass and backpropagation solely from forward activations
of RNNs. Our system, in contrast, does not use RNN activations
and explicitly leaves (meta-parametrized) bidirectional
update rules in place."

It's an important piece of homework to compare details of similarities and differences between this paper and Kirsch & Schmidhuber, "Meta Learning Backpropagation And Improving It", https://arxiv.org/abs/2012.14905 (especially, given that the approach by Kirsch & Schmidhuber is somewhat related to the metalearning approach we are proposing to do in DMMs: page 2, section 2.2 and page 3, section A.3 of https://www.cs.brandeis.edu/~bukatin/towards-practical-dmms.pdf )

Edited Date: 2021-04-26 07:46 pm (UTC)

From:

dmm

Вообще, обе эти работы вполне сырые, но, с другой стороны, достаточно замечательные, чтобы это всё можно было пробовать использовать на практике.

No traces of the source code (I'd like to understand their "unroll" better, but no code is shared, and I am not 100% sure about their "unroll", which plays quite a bit of role, but is not explained well).

Edited Date: 2021-04-26 08:43 pm (UTC)