Account name:
Password
(OpenID?)
(Forgot it?)
Remember Me
You're viewing
dmm
's journal
Create a Dreamwidth Account
Learn More
Interest
Region
Site and Account
FAQ
Email
Reload page in style:
site
light
Dataflow matrix machines (by Anhinga anhinga)
Compact Transformers
Compact Transformers
Aug
.
20th
,
2021
08:57 am
dmm
For those of us (like myself) who'd like to experiment with changing Transformer architecture on a home personal computer.
Links are in the comments.
Flat
|
Top-Level Comments Only
no subject
Date:
2021-09-06 03:34 pm (UTC)
From:
dmm
Not deterministic, that's for sure... So one run might not mean all that much...
no subject
Date:
2021-09-06 04:58 pm (UTC)
From:
dmm
Now this is way better:
[Epoch 200] Top-1 88.67 Time: 58.07
Script finished in 58.07 minutes, best top-1: 88.67, final top-1: 88.67
But comparing in the presence of this much jitter is a nightmare, unless one configuration is overwhelmingly better.
One might need to do tons of reruns (in parallel, perhaps) to get statistics...
12 comments
Reply
Flat
|
Top-Level Comments Only
Profile
Dataflow matrix machines (by Anhinga anhinga)
Neuromorphic Computations with Linear Streams
Recent Entries
Archive
Reading
Network
Tags
Memories
Profile
September
2025
S
M
T
W
T
F
S
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Most Popular Tags
ai art
-
5 uses
ai safety
-
11 uses
anthropic ai
-
4 uses
artificial intelligence
-
28 uses
biology
-
3 uses
category theory
-
3 uses
climate
-
2 uses
computer art
-
3 uses
computer music
-
2 uses
conference
-
20 uses
covid-19
-
2 uses
dall-e 3
-
5 uses
dataflow matrix machines
-
3 uses
differentiable programming
-
3 uses
february 24 2022
-
13 uses
github copilot
-
5 uses
gpt-4
-
17 uses
images as matrices
-
2 uses
jax
-
2 uses
julia
-
15 uses
large language models
-
10 uses
literature
-
5 uses
logic
-
3 uses
machine learning
-
13 uses
manin
-
2 uses
mathematics
-
10 uses
my talks
-
2 uses
neural networks
-
7 uses
openai
-
5 uses
openai codex
-
4 uses
philosophy
-
10 uses
physics
-
11 uses
politics
-
5 uses
program synthesis
-
7 uses
programming languages
-
2 uses
quantum computing
-
2 uses
remember
-
4 uses
scientific papers
-
2 uses
scifi
-
2 uses
sheaves
-
2 uses
technological singularity
-
9 uses
this blog
-
2 uses
transformers
-
23 uses
twitter
-
6 uses
understanding internals of ai
-
15 uses
visual art
-
4 uses
zzznah
-
3 uses
дыбр
-
2 uses
фашизм в рф
-
18 uses
🇺🇦
-
15 uses
Page Summary
dmm
-
(no subject)
Active Entries
1:
Helion details
2:
"Narrow AGI" this year?
3:
Tao on coordinate vs coordinate-free math reasoning
4:
"Aging as a loss of goal-directedness"
5:
New integrated mode for GPT-4 in ChatGPT+
6:
Китайский новый год начнётся 10-го февраля
7:
Automating the Search for Artificial Life with Foundation Models
8:
"Anatomy of a Formal Proof"
Style Credit
Style:
Neutral Good
for
Practicality
by
timeasmymeasure
Expand Cut Tags
No cut tags
Page generated Dec. 29th, 2025 04:25 am
Powered by
Dreamwidth Studios
no subject
Date: 2021-09-06 03:34 pm (UTC)no subject
Date: 2021-09-06 04:58 pm (UTC)[Epoch 200] Top-1 88.67 Time: 58.07
Script finished in 58.07 minutes, best top-1: 88.67, final top-1: 88.67
But comparing in the presence of this much jitter is a nightmare, unless one configuration is overwhelmingly better.
One might need to do tons of reruns (in parallel, perhaps) to get statistics...