Learned optimizers and related topics
Jul. 23rd, 2022 04:43 pmgithub.com/google/learned_optimization - "Meta-learning optimizers and more with JAX"
This is used by various interesting papers including the famous "persistent evolution strategies" paper which I don't understand and "Gradients are Not All You Need" arxiv.org/abs/2111.05803 tempting paper.
Moreover, it is used by a super-interesting "Practical tradeoffs between memory, compute, and performance in learned optimizers" arxiv.org/abs/2203.11860 must-read paper, which is being published at the following conference lifelong-ml.cc/ (Conference on Lifelong Learning Agents - CoLLAs 2022, Aug 18-24)
This is used by various interesting papers including the famous "persistent evolution strategies" paper which I don't understand and "Gradients are Not All You Need" arxiv.org/abs/2111.05803 tempting paper.
Moreover, it is used by a super-interesting "Practical tradeoffs between memory, compute, and performance in learned optimizers" arxiv.org/abs/2203.11860 must-read paper, which is being published at the following conference lifelong-ml.cc/ (Conference on Lifelong Learning Agents - CoLLAs 2022, Aug 18-24)
no subject
Date: 2022-07-23 08:53 pm (UTC)But I need to understand the details of these papers...
no subject
Date: 2022-07-23 10:23 pm (UTC)Gradients, I thought they are far in the past, where spaces were linear (or manifolds in a linear space).
no subject
Date: 2022-07-24 12:24 am (UTC)"Don't Unroll Adjoint: Differentiating SSA-Form Programs", https://arxiv.org/abs/1810.07951, by the author of Zygote.jl
JAX is somewhat different, but the key principles seem to be the same.
What's interesting is that "functional programming motives" are pretty strong in both cases (in particular, there are somewhat mysterious but seemingly strong reasons for immutable computations being particularly suitable for modern differentiable programming engines, such as JAX and Zygote.jl).
The generality they all handle mathematically is "piecewise-differentiable", e.g. they can handle the derivative of ReLU(x) = max(x, 0), so things don't need to be "completely smooth" for these things to work.
no subject
Date: 2022-07-24 12:33 am (UTC)Here is how categorical it is: https://gist.github.com/Keno/4a6507b75288b1fe671e9d1cc683014f (no, I don't understand this text)
Apparently, all this is necessary if one wants to handle higher derivatives really well...
The author is https://twitter.com/KenoFischer
no subject
Date: 2022-07-24 01:24 am (UTC)Посмотрим, получится ли у меня понять эту категорную статью про Diffractor.jl
no subject
Date: 2022-07-29 04:22 pm (UTC)https://gist.github.com/Keno/4a6507b75288b1fe671e9d1cc683014f
Всего-то строчек 70 совсем простого кода на всё про всё, но я в него не могу врубиться...
no subject
Date: 2022-07-29 05:20 pm (UTC)Очень красиво. Стоит поизучать.
no subject
Date: 2022-08-09 11:47 pm (UTC)https://github.com/JuliaObjects/Accessors.jl
https://juliaobjects.github.io/Accessors.jl/stable/getting_started/
*****
Оказывается, если код не понимаешь, то надо его набрать и начать отлаживать; тогда материал лучше ложится в голову, чем если просто читать код...
no subject
Date: 2022-08-10 01:28 am (UTC)Вот уж верно.
У вас уже оптика есть! Классно.