Date: 2023-10-29 03:51 pm (UTC)
dmm: (0)
From: [personal profile] dmm
Revisiting Neel Nanda lecture (and the paper itself).

~10:00 (anecdotally) an attempt to interpret small visual models on MNIST by Chris Olah did not work, but visual models became more interpretable when they got larger.

In Transformers, smaller models are easier to understand, but this is by no means obvious (says Neel Nanda in that lecture, but who knows how this would change eventually; in any case, the knowledge thus acquired does seem to be transferrable OK to larger models).
This account has disabled anonymous posting.
If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting

Profile

dmm: (Default)
Dataflow matrix machines (by Anhinga anhinga)

May 2025

S M T W T F S
    123
456 78910
11 121314151617
18192021222324
25262728293031

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jun. 23rd, 2025 07:18 am
Powered by Dreamwidth Studios