dmm: (Default)
Dataflow matrix machines (by Anhinga anhinga) ([personal profile] dmm) wrote 2023-10-29 05:29 pm (UTC)

~24:50 Ah, this is why we predict not only the next token, but all tokens after all preceding partial contexts.

This is useless at inference, but this works great to parallelize training.

Post a comment in response:

This account has disabled anonymous posting.
If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting