dmm | Learned optimizers and related topics

You're viewing

dmm's journal
Create a Dreamwidth Account Learn More

Reload page in style: site light

github.com/google/learned_optimization - "Meta-learning optimizers and more with JAX"

This is used by various interesting papers including the famous "persistent evolution strategies" paper which I don't understand and "Gradients are Not All You Need" arxiv.org/abs/2111.05803 tempting paper.

Moreover, it is used by a super-interesting "Practical tradeoffs between memory, compute, and performance in learned optimizers" arxiv.org/abs/2203.11860 must-read paper, which is being published at the following conference lifelong-ml.cc/ (Conference on Lifelong Learning Agents - CoLLAs 2022, Aug 18-24)

Flat | Top-Level Comments Only

From:

dmm

CoLLA 2022 would probably be one conference too much for me, so I am not really planning to attend (but might reconsider, there is still time to decide).

But I need to understand the details of these papers...

From:

juan_gandhi

Gradients, I thought they are far in the past, where spaces were linear (or manifolds in a linear space).

From:

dmm

This is probably the best article on the topic (it explains the principles of the standard Julia differentiable programming engine, Zygote.jl, which is the engine I am currently using to take gradients of DMMs and to do my first successful experiments in "circuit synthesis (= DMM synthesis = program synthesis) by sparsifying optimization", something I dreamed about for years, but now it actually works):

"Don't Unroll Adjoint: Differentiating SSA-Form Programs", https://arxiv.org/abs/1810.07951, by the author of Zygote.jl

JAX is somewhat different, but the key principles seem to be the same.

What's interesting is that "functional programming motives" are pretty strong in both cases (in particular, there are somewhat mysterious but seemingly strong reasons for immutable computations being particularly suitable for modern differentiable programming engines, such as JAX and Zygote.jl).

The generality they all handle mathematically is "piecewise-differentiable", e.g. they can handle the derivative of ReLU(x) = max(x, 0), so things don't need to be "completely smooth" for these things to work.

From:

dmm

Now, the upcoming successor to Zygote.jl is highly categorical and uses Optics all the time, but it is not quite ready yet (no formal releases, not even 0.0.1, although people are still playing with it: https://github.com/JuliaDiff/Diffractor.jl )

Here is how categorical it is: https://gist.github.com/Keno/4a6507b75288b1fe671e9d1cc683014f (no, I don't understand this text)

Apparently, all this is necessary if one wants to handle higher derivatives really well...

The author is https://twitter.com/KenoFischer

From:

dmm

Главное, эти оптики, они совершенно естественны (самая главная штука в reverse mode autodiff, она, как раз, контравариантна). Не то, чтобы эти добрые люди выпендривались, оно естественно так получается...

Посмотрим, получится ли у меня понять эту категорную статью про Diffractor.jl

From:

dmm

Ну вот, оказывается, что я это не могу понять. Причём на самом простом уровне: вот все эти как бы простые примеры категорного программирования с оптиками на второй страничке этого текста, пока что это совсем чуждый мне способ думать:

https://gist.github.com/Keno/4a6507b75288b1fe671e9d1cc683014f

Всего-то строчек 70 совсем простого кода на всё про всё, но я в него не могу врубиться...

From:

juan_gandhi

Очень красиво. Стоит поизучать.

From:

dmm

О, вот они все где, эти оптики:

https://github.com/JuliaObjects/Accessors.jl

https://juliaobjects.github.io/Accessors.jl/stable/getting_started/

*****

Оказывается, если код не понимаешь, то надо его набрать и начать отлаживать; тогда материал лучше ложится в голову, чем если просто читать код...

From:

juan_gandhi

Вот уж верно.

У вас уже оптика есть! Классно.

Flat | Top-Level Comments Only

Profile

Dataflow matrix machines (by Anhinga anhinga)

Neuromorphic Computations with Linear Streams

December 2025

S	M	T	W	T	F	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Page Summary

Active Entries

Style Credit

Style: Neutral Good for Practicality by timeasmymeasure

Expand Cut Tags

No cut tags

Page generated Feb. 26th, 2026 05:16 am

Dataflow matrix machines (by Anhinga anhinga)

Learned optimizers and related topics

Learned optimizers and related topics

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

Profile

December 2025

Most Popular Tags

Page Summary

Active Entries

Style Credit

Expand Cut Tags