dmm | Let's understand Large Language Models better

Entry tags:

large language models,
transformers,
understanding internals of ai

Let's understand Large Language Models better

This is a good starting point:

"A Mathematical Framework for Transformer Circuits", Dec 2021
transformer-circuits.pub/2021/framework/index.html

Flat | Top-Level Comments Only

I just double-checked his remark about MLP having 4 times more neurons than embedding in https://github.com/karpathy/minGPT/blob/master/mingpt/model.py and yes, it is the case there

(but we need to see how this works with context length, it's not very transparent in the code, which is inconvenient; in MLP it is even less transparent than in the attention layer, where they have to write it explicitly in connection with splitting into attention heads)

53:50 Transformers figure out how to clean-up unnecessary leftovers from residual stream

56:00 he speculates that embedding and unembedding only using a fraction of residual stream dimensionality

27 comments
Post a new comment

Flat | Top-Level Comments Only

Let's understand Large Language Models better

no subject

no subject

no subject