Active Entries
- 1: "Narrow AGI" this year?
- 2: Tao on coordinate vs coordinate-free math reasoning
- 3: "Aging as a loss of goal-directedness"
- 4: New integrated mode for GPT-4 in ChatGPT+
- 5: Китайский новый год начнётся 10-го февраля
- 6: Automating the Search for Artificial Life with Foundation Models
- 7: "Anatomy of a Formal Proof"
- 8: C to safe Rust automatic translation using LLMs and dynamic analysis
- 9: GonzoML
- 10: Transformers as a Computational Model (workshop)
Style Credit
- Style: Neutral Good for Practicality by
Expand Cut Tags
No cut tags
no subject
Date: 2023-10-29 07:01 pm (UTC)(but we need to see how this works with context length, it's not very transparent in the code, which is inconvenient; in MLP it is even less transparent than in the attention layer, where they have to write it explicitly in connection with splitting into attention heads)