Date: 2023-08-22 03:51 pm (UTC)
dmm: (Default)
From: [personal profile] dmm
"An especially useful consequence of the residual stream being linear is that one can think of implicit "virtual weights" directly connecting any pair of layers (even those separated by many other layers), by multiplying out their interactions through the residual stream. These virtual weights are the product of the output weights of one layer with the input weights of another (ie. W_{I}^2W_{O}^1), and describe the extent to which a later layer reads in the information written by a previous layer."
This account has disabled anonymous posting.
If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting

Profile

dmm: (Default)
Dataflow matrix machines (by Anhinga anhinga)

May 2025

S M T W T F S
    123
456 78910
11 121314151617
18192021222324
25262728293031

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jul. 16th, 2025 06:02 am
Powered by Dreamwidth Studios