dmm: (0)
Dataflow matrix machines (by Anhinga anhinga) ([personal profile] dmm) wrote 2023-10-29 06:43 pm (UTC)

~43:00 virtual weights - it's not really between layers, it's between attention heads (just see how (if at all) to take into account a particular attention head output by a particular attention head input)

that gives some crude proxy for what's going on

Post a comment in response:

This account has disabled anonymous posting.
If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting