Skip to Main Content

no subject

~24:50 Ah, this is why we predict not only the next token, but all tokens after all preceding partial contexts.

This is useless at inference, but this works great to parallelize training.

Post a comment in response:

From:

Anonymous This account has disabled anonymous posting.

OpenID

Dreamwidth account

If you don't have an account you can create one now.

Subject

HTML doesn't work in the subject.

Formatting type

Message