dmm: (Default)
Dataflow matrix machines (by Anhinga anhinga) ([personal profile] dmm) wrote 2023-10-29 05:51 pm (UTC)

~29:00 it seems that most computations only go through a couple of layers (residual stream gives it the freedom to do this).

(So the bulk of computations are probably shallow, with a bit of "true deepness" sprinkled on top of it.)

Post a comment in response:

This account has disabled anonymous posting.
If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting