dmm | (Reply)

You're viewing

dmm's journal
Create a Dreamwidth Account Learn More

Reload page in style: site light

From:

dmm

Moving towards understanding of attention heads...

1) Interestingly, what they seem to say is that splitting into attention heads is not just an efficiency device, but is semantically meaningful (it would be interesting to experiment with very small dimensions for attention heads, perhaps even as small as 1 (and also 2, etc)).

2) Interestingly, Neel Nanda thinks that using the tensor product formalism is a methodological mistake (it certainly does make the material more difficult to understand, but perhaps this might enable more powerful ways of thinking; anyway, this use of tensor products is, at least, optional).

From:

Anonymous This account has disabled anonymous posting.

OpenID

Dreamwidth account

If you don't have an account you can create one now.

Subject

HTML doesn't work in the subject.

Formatting type

Message

Profile

Dataflow matrix machines (by Anhinga anhinga)

Neuromorphic Computations with Linear Streams

May 2025

S	M	T	W	T	F	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Most Popular Tags

Active Entries

Style Credit

Style: Neutral Good for Practicality by timeasmymeasure

Expand Cut Tags

No cut tags

Page generated Jun. 21st, 2025 06:18 pm

Powered by Dreamwidth Studios