dmm | "Meta-Learning Bidirectional Update Rules"

You're viewing

dmm's journal
Create a Dreamwidth Account Learn More

Reload page in style: site light

I am reading this paper: arxiv.org/abs/2104.04657

"In this paper, we introduce a new type of generalized neural network where neurons and synapses maintain multiple states. We show that classical gradient-based backpropagation in neural networks can be seen as a special case of a two-state network where one state is used for activations and another for gradients, with update rules derived from the chain rule. In our generalized framework, networks have neither explicit notion of nor ever receive gradients. The synapses and neurons are updated using a bidirectional Hebb-style update rule parameterized by a shared low-dimensional "genome". We show that such genomes can be meta-learned from scratch, using either conventional optimization techniques, or evolutionary strategies, such as CMA-ES. Resulting update rules generalize to unseen tasks and train faster than gradient descent based optimizers for several standard computer vision and synthetic tasks."

Flat | Top-Level Comments Only

From:

dmm

page 2-3:

"To learn a new type of neural network we need to formally
define the space of possible configurations. Our proposed
space is a generalization of classical artificial neural networks,
with inspiration drawn from biology. For the purpose
of clarity, in this section we modify the notation by
abstracting from the standard layer structure of a neural
network, and instead assume our network is essentially a
bag-of-neurons N of n neurons with a connectivity structure
defined by two functions: “upstream” neurons I(i) \in N
that send their outputs to i, and the set of “downstream”
neurons J(i) \in N that receive the output of i as one of
their inputs. Thus the synapse weight matrix w_ij can encode
separate weights for forward and backward connections."

I was also thinking this way about "superneurons" in DMMs, but I was not thinking about bidirectional weights (although, of course, both w_ij and w_ji could be non-zero and different). Of course, with "superneurons", if one really wants a layer, one can put it inside a single "superneuron".

(If one looks at page 4, their formalism is a bit of a mess, if one needs feed-forward non-zero w_ij and w_ji. In those case, one would need to duplicate both slots. As written, their formalism only works in the absence of simultaneously non-zero w_ij and w_ji.)

Edited Date: 2021-04-26 06:57 pm (UTC)