Learned optimizers and related topics
Jul. 23rd, 2022 04:43 pmgithub.com/google/learned_optimization - "Meta-learning optimizers and more with JAX"
This is used by various interesting papers including the famous "persistent evolution strategies" paper which I don't understand and "Gradients are Not All You Need" arxiv.org/abs/2111.05803 tempting paper.
Moreover, it is used by a super-interesting "Practical tradeoffs between memory, compute, and performance in learned optimizers" arxiv.org/abs/2203.11860 must-read paper, which is being published at the following conference lifelong-ml.cc/ (Conference on Lifelong Learning Agents - CoLLAs 2022, Aug 18-24)
This is used by various interesting papers including the famous "persistent evolution strategies" paper which I don't understand and "Gradients are Not All You Need" arxiv.org/abs/2111.05803 tempting paper.
Moreover, it is used by a super-interesting "Practical tradeoffs between memory, compute, and performance in learned optimizers" arxiv.org/abs/2203.11860 must-read paper, which is being published at the following conference lifelong-ml.cc/ (Conference on Lifelong Learning Agents - CoLLAs 2022, Aug 18-24)
no subject
Date: 2022-07-24 12:24 am (UTC)"Don't Unroll Adjoint: Differentiating SSA-Form Programs", https://arxiv.org/abs/1810.07951, by the author of Zygote.jl
JAX is somewhat different, but the key principles seem to be the same.
What's interesting is that "functional programming motives" are pretty strong in both cases (in particular, there are somewhat mysterious but seemingly strong reasons for immutable computations being particularly suitable for modern differentiable programming engines, such as JAX and Zygote.jl).
The generality they all handle mathematically is "piecewise-differentiable", e.g. they can handle the derivative of ReLU(x) = max(x, 0), so things don't need to be "completely smooth" for these things to work.