dmm: (Default)
Dataflow matrix machines (by Anhinga anhinga) ([personal profile] dmm) wrote 2023-10-29 06:49 pm (UTC)

~45:00 there is this weird ambiguity when people only use embedding dimension when they talk about dimensionality of residual stream and omit the context length dimension; I can imagine this might bite us in many ways (not only in confusion, but in leading to wrong conclusions)

Post a comment in response:

This account has disabled anonymous posting.
If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting