It turns out that this is open-access: dl.acm.org/doi/10.1145/3448250
The nuances in that lecture are very interesting, shed various light in the disagreement between Hinton et al and Schmidhuber et al (this one is written from the Hinton et al side, obviously; their emphasis is that technical aspects are equally important and not subservient to "pioneering theory"; e.g. a lot of rather recent pre-2012 developments such as the practical understanding of the role of ReLU is what made the AlexNet breakthrough possible, and moreover things like "the very efficient use of multiple GPUs by Alex Krizhevsky" are also key, not just the neural architecture ideas).
There is a whole section on Transformers, I am going to include it in the comments verbatim.
The journal publication is July 2021, and there are references in the paper which are newer than 2018; I don't know how heavily the text itself has been edited since 2018.
The nuances in that lecture are very interesting, shed various light in the disagreement between Hinton et al and Schmidhuber et al (this one is written from the Hinton et al side, obviously; their emphasis is that technical aspects are equally important and not subservient to "pioneering theory"; e.g. a lot of rather recent pre-2012 developments such as the practical understanding of the role of ReLU is what made the AlexNet breakthrough possible, and moreover things like "the very efficient use of multiple GPUs by Alex Krizhevsky" are also key, not just the neural architecture ideas).
There is a whole section on Transformers, I am going to include it in the comments verbatim.
The journal publication is July 2021, and there are references in the paper which are newer than 2018; I don't know how heavily the text itself has been edited since 2018.