<?xml version='1.0' encoding='utf-8' ?>

<rss version='2.0' xmlns:lj='http://www.livejournal.org/rss/lj/1.0/' xmlns:atom10='http://www.w3.org/2005/Atom'>
<channel>
  <title>Dataflow matrix machines (by Anhinga anhinga)</title>
  <link>https://dmm.dreamwidth.org/</link>
  <description>Dataflow matrix machines (by Anhinga anhinga) - Dreamwidth Studios</description>
  <lastBuildDate>Thu, 21 Mar 2024 20:04:12 GMT</lastBuildDate>
  <generator>LiveJournal / Dreamwidth Studios</generator>
  <lj:journal>dmm</lj:journal>
  <lj:journaltype>personal</lj:journaltype>
  <image>
    <url>https://v2.dreamwidth.org/11549465/3235132</url>
    <title>Dataflow matrix machines (by Anhinga anhinga)</title>
    <link>https://dmm.dreamwidth.org/</link>
    <width>100</width>
    <height>100</height>
  </image>

<item>
  <guid isPermaLink='true'>https://dmm.dreamwidth.org/81892.html</guid>
  <pubDate>Thu, 21 Mar 2024 20:04:12 GMT</pubDate>
  <title>WIRED published a story on Transformer invention</title>
  <link>https://dmm.dreamwidth.org/81892.html</link>
  <description>The history of the creation of &amp;quot;Attention Is All You Need&amp;quot;, &lt;a href=&quot;https://arxiv.org/abs/1706.03762&quot;&gt;arxiv.org/abs/1706.03762&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;It&apos;s pretty intense; it&apos;s very interesting what it took to achieve that.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;img src=&quot;https://www.dreamwidth.org/tools/commentcount?user=dmm&amp;ditemid=81892&quot; width=&quot;30&quot; height=&quot;12&quot; alt=&quot;comment count unavailable&quot; style=&quot;vertical-align: middle;&quot;/&gt; comments</description>
  <comments>https://dmm.dreamwidth.org/81892.html</comments>
  <category>transformers</category>
  <lj:security>public</lj:security>
  <lj:reply-count>6</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>https://dmm.dreamwidth.org/76107.html</guid>
  <pubDate>Fri, 15 Sep 2023 03:32:54 GMT</pubDate>
  <title>6 months since GPT-4 release</title>
  <link>https://dmm.dreamwidth.org/76107.html</link>
  <description>A good way to mark this occasion is to try to read a new paper which seems to be a major breakthrough in understanding and harnessing the magic of Transformers:&lt;br /&gt;&lt;br /&gt;&amp;quot;Uncovering mesa-optimization algorithms in Transformers&amp;quot;&lt;br /&gt;&lt;br /&gt;&lt;div class=&quot;gmail_default&quot; style=&quot;font-size:small&quot;&gt;&amp;quot;we demonstrate that minimizing a generic autoregressive loss gives rise to a subsidiary gradient-based optimization algorithm running inside the forward pass of a Transformer. This phenomenon has been recently termed mesa-optimization&amp;quot;&lt;/div&gt;&lt;div class=&quot;gmail_default&quot; style=&quot;font-size:small&quot;&gt;&amp;nbsp;&lt;/div&gt;&lt;div class=&quot;gmail_default&quot; style=&quot;font-size:small&quot;&gt;&amp;quot;Moreover, we find that the resulting mesa-optimization algorithms exhibit in-context few-shot learning capabilities,&lt;br /&gt;independently of model scale. Our results therefore complement previous reports characterizing the&lt;br /&gt;emergence of few-shot learning in large-scale LLMs&amp;quot;&lt;/div&gt;&lt;div class=&quot;gmail_default&quot; style=&quot;font-size:small&quot;&gt;&lt;br /&gt;&amp;nbsp;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;&lt;img src=&quot;https://www.dreamwidth.org/tools/commentcount?user=dmm&amp;ditemid=76107&quot; width=&quot;30&quot; height=&quot;12&quot; alt=&quot;comment count unavailable&quot; style=&quot;vertical-align: middle;&quot;/&gt; comments</description>
  <comments>https://dmm.dreamwidth.org/76107.html</comments>
  <category>transformers</category>
  <category>gpt-4</category>
  <category>understanding internals of ai</category>
  <lj:security>public</lj:security>
  <lj:reply-count>15</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>https://dmm.dreamwidth.org/75128.html</guid>
  <pubDate>Tue, 22 Aug 2023 14:57:18 GMT</pubDate>
  <title>Let&apos;s understand Large Language Models better</title>
  <link>https://dmm.dreamwidth.org/75128.html</link>
  <description>This is a good starting point: &lt;br /&gt;&lt;br /&gt;&amp;quot;A Mathematical Framework for Transformer Circuits&amp;quot;, Dec 2021&lt;br /&gt;&lt;a href=&quot;https://transformer-circuits.pub/2021/framework/index.html&quot;&gt;transformer-circuits.pub/2021/framework/index.html&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;img src=&quot;https://www.dreamwidth.org/tools/commentcount?user=dmm&amp;ditemid=75128&quot; width=&quot;30&quot; height=&quot;12&quot; alt=&quot;comment count unavailable&quot; style=&quot;vertical-align: middle;&quot;/&gt; comments</description>
  <comments>https://dmm.dreamwidth.org/75128.html</comments>
  <category>large language models</category>
  <category>understanding internals of ai</category>
  <category>transformers</category>
  <lj:security>public</lj:security>
  <lj:reply-count>27</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>https://dmm.dreamwidth.org/74999.html</guid>
  <pubDate>Mon, 14 Aug 2023 16:04:13 GMT</pubDate>
  <title>Large Language Models and Transformers (a workshop this week)</title>
  <link>https://dmm.dreamwidth.org/74999.html</link>
  <description>&lt;a href=&quot;https://simons.berkeley.edu/workshops/large-language-models-transformers&quot;&gt;simons.berkeley.edu/workshops/large-language-models-transformers&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Youtube livestream (and, presumably, post-conference youtube recording) are available.&lt;br /&gt;&lt;br /&gt;Some of the talks look really interesting.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;img src=&quot;https://www.dreamwidth.org/tools/commentcount?user=dmm&amp;ditemid=74999&quot; width=&quot;30&quot; height=&quot;12&quot; alt=&quot;comment count unavailable&quot; style=&quot;vertical-align: middle;&quot;/&gt; comments</description>
  <comments>https://dmm.dreamwidth.org/74999.html</comments>
  <category>transformers</category>
  <category>conference</category>
  <category>large language models</category>
  <lj:security>public</lj:security>
  <lj:reply-count>6</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>https://dmm.dreamwidth.org/72483.html</guid>
  <pubDate>Thu, 20 Apr 2023 12:10:44 GMT</pubDate>
  <title>AI and East-West cultural differences</title>
  <link>https://dmm.dreamwidth.org/72483.html</link>
  <description>&amp;quot;A second difficulty in communicating alignment ideas was based on differing ontologies. A surface-level explanation is that &lt;strong&gt;Japan is quite techno-optimistic compared to the west, and has strong intuitions that AI will operate harmoniously with humans. &lt;/strong&gt;A more nuanced explanation is that &lt;em&gt;Buddhist- and Shinto-inspired axioms in Japanese thinking lead to the conclusion that superintelligence will be conscious and aligned by default. &lt;/em&gt;One senior researcher from RIKEN noted during the conference that&lt;em&gt; &amp;ldquo;it is obviously impossible to control a superintelligence, but living alongside one seems possible.&amp;rdquo;&lt;/em&gt; Some visible consequences of this are that&lt;strong&gt; machine consciousness research in Japan is taken quite seriously&lt;/strong&gt;, whereas in the West there is little discussion of it.&amp;quot;&lt;br /&gt;&lt;br /&gt;***&lt;br /&gt;&lt;br /&gt;I think it&apos;s time for us to start asking if, for example, GPT-4-produced simulations have associated subjective experience.&lt;br /&gt;&lt;br /&gt;We have a feed-forward transducer in an autoregressive mode; each time a new token is produced by the feed-forward Transformer, the whole dialog including the just produced token is fed again to the input of the model, so there is a recurrent dynamics here (cf. section 3.4 of &amp;quot;Transformers are RNNs: Fast Autoregressive Transformers  with Linear Attention&amp;quot;, &lt;a href=&quot;https://arxiv.org/abs/2006.16236&quot;&gt;arxiv.org/abs/2006.16236&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;So I would not be too surprised if that process actually &amp;quot;feels what it says&amp;quot;.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;img src=&quot;https://www.dreamwidth.org/tools/commentcount?user=dmm&amp;ditemid=72483&quot; width=&quot;30&quot; height=&quot;12&quot; alt=&quot;comment count unavailable&quot; style=&quot;vertical-align: middle;&quot;/&gt; comments</description>
  <comments>https://dmm.dreamwidth.org/72483.html</comments>
  <category>ai safety</category>
  <category>philosophy</category>
  <category>artificial intelligence</category>
  <category>transformers</category>
  <category>qualia</category>
  <lj:security>public</lj:security>
  <lj:reply-count>2</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>https://dmm.dreamwidth.org/70362.html</guid>
  <pubDate>Wed, 08 Mar 2023 02:18:07 GMT</pubDate>
  <title>Something is happening around Waluigi effect</title>
  <link>https://dmm.dreamwidth.org/70362.html</link>
  <description>I am not sure what (I missed the latest part of the story). But here is a beautiful petition on change.org which says this:&lt;br /&gt;&lt;br /&gt;*****&lt;br /&gt;Waluigi has been scorned by Nintendo yet again, being left out of the roster of Super Smash Bros Ultimate. However, there is still a chance for Waluigi to get his rightly deserved place in the spotlight. Waluigi should appear in the next edition of Higher Algebra.&lt;br /&gt;&lt;br /&gt;Indeed, Waluigi fits naturally into the framework of stable ∞-categories, and would probably have been incorporated long ago were Nintendo not so notoriously protective of their copyright. For example, the discussion of the Waldhausen construction in §1.2.2 generalizes without much additional effort to the WAHldhausen construction. It is also worth noting that a careful treatment of the WAHll finiteness obstruction from the ∞-categorical perspective is sorely lacking from the literature.&lt;br /&gt;*****&lt;br /&gt;&lt;br /&gt;(I&apos;ve read the original Waluigi effect paper. I am going to write more about all this in the comments.)&lt;br /&gt;&lt;br /&gt;&lt;img src=&quot;https://www.dreamwidth.org/tools/commentcount?user=dmm&amp;ditemid=70362&quot; width=&quot;30&quot; height=&quot;12&quot; alt=&quot;comment count unavailable&quot; style=&quot;vertical-align: middle;&quot;/&gt; comments</description>
  <comments>https://dmm.dreamwidth.org/70362.html</comments>
  <category>transformers</category>
  <category>category theory</category>
  <category>large language models</category>
  <lj:security>public</lj:security>
  <lj:reply-count>24</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>https://dmm.dreamwidth.org/69879.html</guid>
  <pubDate>Thu, 23 Feb 2023 03:44:58 GMT</pubDate>
  <title>GreaterWrong viewer for LessWrong; Conjecture.dev</title>
  <link>https://dmm.dreamwidth.org/69879.html</link>
  <description>I am reading more and more LessWrong in recent months (mostly, after the Simulator theory by Janus (work done while at &lt;strong&gt;Conjecture&lt;/strong&gt;) has been posted there in September).&lt;br /&gt;&lt;br /&gt;I still think the Simulator theory is probably the single most important research breakthrough of 2022.&lt;br /&gt;&lt;br /&gt;These days LessWrong is dominated by writing related to AI safety (the topic is made particularly acute by the recent progress in LLMs: ChatGPT and even more capable Bing Chat; &lt;strong&gt;no consensus whatsoever, of course&lt;/strong&gt;, but I do think that GPT-3 release in May 2020 is, in some sense, an equivalent of the nuclear fission discovery on 19 December 1938, and that ChatGPT performance (+ Bing Chat clearly drastically enhanced capabilities even compared to that) is, in the same sense, an equivalent of the first working nuclear reactor on 2 December 1942, if one goes by &amp;quot;AI today is what nuclear energy has been back then&amp;quot; analogy).&lt;br /&gt;&lt;br /&gt;So, one thing which might be useful is that there is GreaterWrong alternative viewer (which looks different from LessWrong default viewer and which can be visually tuned in terms of presentation style; also different default front page for the site if one uses GreaterWrong). Which viewer is better might depend on your device (display, browser, etc).&lt;br /&gt;&lt;br /&gt;Another thing, &lt;strong&gt;Conjecture&lt;/strong&gt; people tend to produce some of the best, most interesting articles there.&lt;br /&gt;&lt;br /&gt;I&apos;ll put a few links into the comments.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;img src=&quot;https://www.dreamwidth.org/tools/commentcount?user=dmm&amp;ditemid=69879&quot; width=&quot;30&quot; height=&quot;12&quot; alt=&quot;comment count unavailable&quot; style=&quot;vertical-align: middle;&quot;/&gt; comments</description>
  <comments>https://dmm.dreamwidth.org/69879.html</comments>
  <category>technological singularity</category>
  <category>artificial intelligence</category>
  <category>transformers</category>
  <category>ai safety</category>
  <category>understanding internals of ai</category>
  <lj:security>public</lj:security>
  <lj:reply-count>7</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>https://dmm.dreamwidth.org/68908.html</guid>
  <pubDate>Tue, 07 Feb 2023 07:41:37 GMT</pubDate>
  <title>Prompt engineering guide</title>
  <link>https://dmm.dreamwidth.org/68908.html</link>
  <description>&lt;span class=&quot;css-901oao css-16my406 r-poiln3 r-bcqeeo r-qvutc0&quot;&gt;A huge field already: &lt;a href=&quot;https://github.com/dair-ai/Prompt-Engineering-Guide&quot;&gt;github.com/dair-ai/Prompt-Engineering-Guide&lt;/a&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;img src=&quot;https://www.dreamwidth.org/tools/commentcount?user=dmm&amp;ditemid=68908&quot; width=&quot;30&quot; height=&quot;12&quot; alt=&quot;comment count unavailable&quot; style=&quot;vertical-align: middle;&quot;/&gt; comments</description>
  <comments>https://dmm.dreamwidth.org/68908.html</comments>
  <category>large language models</category>
  <category>transformers</category>
  <category>artificial intelligence</category>
  <category>programming languages</category>
  <lj:security>public</lj:security>
  <lj:reply-count>2</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>https://dmm.dreamwidth.org/68826.html</guid>
  <pubDate>Sun, 05 Feb 2023 06:26:58 GMT</pubDate>
  <title>Technical interview with Neel Nanda</title>
  <link>https://dmm.dreamwidth.org/68826.html</link>
  <description>&lt;a href=&quot;https://www.lesswrong.com/posts/r2yTwkGt3kbQG2mXi/axrp-episode-19-mechanistic-interpretability-with-neel-nanda&quot;&gt;www.lesswrong.com/posts/r2yTwkGt3kbQG2mXi/axrp-episode-19-mechanistic-interpretability-with-neel-nanda&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;img src=&quot;https://www.dreamwidth.org/tools/commentcount?user=dmm&amp;ditemid=68826&quot; width=&quot;30&quot; height=&quot;12&quot; alt=&quot;comment count unavailable&quot; style=&quot;vertical-align: middle;&quot;/&gt; comments</description>
  <comments>https://dmm.dreamwidth.org/68826.html</comments>
  <category>ai safety</category>
  <category>anthropic ai</category>
  <category>understanding internals of ai</category>
  <category>artificial intelligence</category>
  <category>machine learning</category>
  <category>transformers</category>
  <lj:security>public</lj:security>
  <lj:reply-count>3</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>https://dmm.dreamwidth.org/67676.html</guid>
  <pubDate>Sun, 08 Jan 2023 05:42:26 GMT</pubDate>
  <title>New Year resolution</title>
  <link>https://dmm.dreamwidth.org/67676.html</link>
  <description>To read my &lt;em&gt;&lt;strong&gt;https://twitter.com/home&lt;/strong&gt;&lt;/em&gt; more regularly (that&apos;s absolutely the best source of info at the moment).&lt;br /&gt;&lt;br /&gt;A small fraction of today&apos;s catch: &lt;br /&gt;&lt;br /&gt;New work by Janus&lt;br /&gt;&lt;br /&gt;A new involved take on AI safety/alignment&lt;br /&gt;&lt;br /&gt;&lt;em&gt;&lt;strong&gt;(What&apos;s the right way to organize all that information?)&lt;/strong&gt;&lt;/em&gt;&lt;br /&gt;&lt;br /&gt;Links are in the comments (I think the new work by Janus is more important even for alignment, and is just overall more important of the two topics of this post)...&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;img src=&quot;https://www.dreamwidth.org/tools/commentcount?user=dmm&amp;ditemid=67676&quot; width=&quot;30&quot; height=&quot;12&quot; alt=&quot;comment count unavailable&quot; style=&quot;vertical-align: middle;&quot;/&gt; comments</description>
  <comments>https://dmm.dreamwidth.org/67676.html</comments>
  <category>ai safety</category>
  <category>understanding internals of ai</category>
  <category>twitter</category>
  <category>technological singularity</category>
  <category>artificial intelligence</category>
  <category>transformers</category>
  <lj:security>public</lj:security>
  <lj:reply-count>18</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>https://dmm.dreamwidth.org/66967.html</guid>
  <pubDate>Mon, 26 Dec 2022 17:38:19 GMT</pubDate>
  <title>&quot;MPLP: Learning a Message Passing Learning Protocol&quot;</title>
  <link>https://dmm.dreamwidth.org/66967.html</link>
  <description>I have been looking at a recent rather remarkable paper which includes the DeepDream creator among its authors, and I&apos;ve decided to check whether I missed any of his works; and I turns out there is this paper I really should be aware of. This really resonates with some of the thing I have been exploring this year.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href=&quot;https://arxiv.org/abs/2007.00970&quot;&gt;arxiv.org/abs/2007.00970&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&amp;quot;We present a novel method for learning the weights of an artificial neural network - a Message Passing Learning Protocol (MPLP). In MPLP, we abstract every operations occurring in ANNs as independent agents. Each agent is responsible for ingesting incoming multidimensional messages from other agents, updating its internal state, and generating multidimensional messages to be passed on to neighbouring agents. We demonstrate the viability of MPLP as opposed to traditional gradient-based approaches on simple feed-forward neural networks, and present a framework capable of generalizing to non-traditional neural network architectures. MPLP is meta learned using end-to-end gradient-based meta-optimisation. We further discuss the observed properties of MPLP and hypothesize its applicability on various fields of deep learning.&amp;quot;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;img src=&quot;https://www.dreamwidth.org/tools/commentcount?user=dmm&amp;ditemid=66967&quot; width=&quot;30&quot; height=&quot;12&quot; alt=&quot;comment count unavailable&quot; style=&quot;vertical-align: middle;&quot;/&gt; comments</description>
  <comments>https://dmm.dreamwidth.org/66967.html</comments>
  <category>understanding internals of ai</category>
  <category>neural networks</category>
  <category>zzznah</category>
  <category>transformers</category>
  <category>machine learning</category>
  <category>artificial intelligence</category>
  <lj:security>public</lj:security>
  <lj:reply-count>4</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>https://dmm.dreamwidth.org/65388.html</guid>
  <pubDate>Sun, 04 Dec 2022 16:18:56 GMT</pubDate>
  <title>update on AI progress and AI safety</title>
  <link>https://dmm.dreamwidth.org/65388.html</link>
  <description>AI-safety-wise, the write-up, &lt;a href=&quot;https://scottaaronson.blog/?p=6823&quot; rel=&quot;bookmark&quot; title=&quot;Permanent Link: My AI Safety Lecture for UT Effective Altruism&quot;&gt;My AI Safety Lecture for UT Effective Altruism&lt;/a&gt; by Scott Aaronson is  very nice reasonably objective and theory-friendly overview of the current state of AI safety as a field of science.&lt;br /&gt;&lt;br /&gt;AI-progress-wise, ChatGPT based on roughly speaking GPT-3.5 has been released recently, with people doing tons of  interesting things with it, including meaningful writing and software  generation... This seems to be another major step-up.&lt;br /&gt;&lt;br /&gt;&lt;img src=&quot;https://www.dreamwidth.org/tools/commentcount?user=dmm&amp;ditemid=65388&quot; width=&quot;30&quot; height=&quot;12&quot; alt=&quot;comment count unavailable&quot; style=&quot;vertical-align: middle;&quot;/&gt; comments</description>
  <comments>https://dmm.dreamwidth.org/65388.html</comments>
  <category>ai safety</category>
  <category>program synthesis</category>
  <category>technological singularity</category>
  <category>artificial intelligence</category>
  <category>transformers</category>
  <lj:security>public</lj:security>
  <lj:reply-count>8</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>https://dmm.dreamwidth.org/64931.html</guid>
  <pubDate>Thu, 17 Nov 2022 01:44:04 GMT</pubDate>
  <title>Conferences; research updates</title>
  <link>https://dmm.dreamwidth.org/64931.html</link>
  <description>This week, Nov 17-18, Thu-Fri, 8am-11:45am Boston time, &lt;b&gt;&amp;quot;Quantum physics and the first-person perspective&amp;quot;&lt;/b&gt;: &lt;a href=&quot;https://www.essentiafoundation.org/quantum-physics-and-the-first-person-perspective/seeing/&quot;&gt;www.essentiafoundation.org/quantum-physics-and-the-first-person-perspective/seeing/&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;JuliaCon 2023&lt;/strong&gt;, &lt;a href=&quot;https://juliacon.org/2023/&quot;&gt;juliacon.org/2023/&lt;/a&gt;  the call for proposals is posted, deadline Dec 18: &lt;a href=&quot;https://pretalx.com/juliacon2023/cfp&quot;&gt;pretalx.com/juliacon2023/cfp&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;I&apos;ve spent more quality time focusing of two breakthroughs in understanding the nature and the behavior of machine learning models which came from the &amp;quot;penumbra&amp;quot; of &amp;quot;prosaic alignment&amp;quot; start-ups and which &lt;strong&gt;I wrote about in my previous two posts&lt;/strong&gt;. &lt;br /&gt;&lt;br /&gt;&lt;strong&gt;&amp;quot;Grokking is (more or less) solved.&amp;quot;&lt;/strong&gt; I took brief notes between Oct 21 and Oct 23: &lt;a href=&quot;https://github.com/anhinga/2022-notes/tree/main/Grokking-is-solved&quot;&gt;github.com/anhinga/2022-notes/tree/main/Grokking-is-solved&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;&amp;quot;Generative autoregressive models are similators.&amp;quot;&lt;/strong&gt; I took extensive notes between Oct 5 and Oct 23: &lt;a href=&quot;https://github.com/anhinga/2022-notes/tree/main/Generative-autoregressive-models-are-similators&quot;&gt;github.com/anhinga/2022-notes/tree/main/Generative-autoregressive-models-are-similators&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;I am continuing to develop thoughts related to these topics, I am going to gradually write more about those topics in the comments.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;img src=&quot;https://www.dreamwidth.org/tools/commentcount?user=dmm&amp;ditemid=64931&quot; width=&quot;30&quot; height=&quot;12&quot; alt=&quot;comment count unavailable&quot; style=&quot;vertical-align: middle;&quot;/&gt; comments</description>
  <comments>https://dmm.dreamwidth.org/64931.html</comments>
  <category>machine learning</category>
  <category>artificial intelligence</category>
  <category>conference</category>
  <category>transformers</category>
  <category>ai safety</category>
  <category>philosophy</category>
  <category>technological singularity</category>
  <category>physics</category>
  <category>anthropic ai</category>
  <category>julia</category>
  <category>understanding internals of ai</category>
  <lj:security>public</lj:security>
  <lj:reply-count>14</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>https://dmm.dreamwidth.org/64571.html</guid>
  <pubDate>Sun, 16 Oct 2022 14:54:41 GMT</pubDate>
  <title>Grokking is (more or less) solved</title>
  <link>https://dmm.dreamwidth.org/64571.html</link>
  <description>The most interesting conceptual AI advances seem lately to come from &amp;quot;prosaic alignment&amp;quot; start-ups. These are companies which believe that the current trend of improving Transformer models is likely to lead straight to AGI, and that better understanding of the nature and properties of these model is key to AI&amp;nbsp;safety (and, of course, it&apos;s also key to better AI&amp;nbsp;capabilities).&lt;br /&gt;&lt;br /&gt;And it is often the case that the key elements of work are done by people &amp;quot;on the edge&amp;quot;, &amp;quot;in the penumbra&amp;quot; of those alignment start-ups.&lt;br /&gt;&lt;br /&gt;In the previous post I mentioned the key new understanding of large Transformer models as &lt;em&gt;&lt;strong&gt;simulators&lt;/strong&gt;&lt;/em&gt;. That work has been done &amp;quot;while at Conjecture&amp;quot;, but is not listed as directly coming from Conjecture (one of those &amp;quot;prosaic alignment&amp;quot; start-ups). I think the key people involved are still at Conjecture, but they seem to be trying to keep some distance between Conjecture and this work. I am continuing to take notes of those materials and commit them to GitHub (see links in the comments to the previous post).&lt;br /&gt;&lt;br /&gt;Here is another one of those stories. Grokking is a phenomenon, where small Transformers look at a part of a mathematical structure for quite a while, and then rather suddenly transition to understanding the whole of that mathematical structure including the part they never see in training. It has been discovered in 2021 and has been a subject of a number of follow-up attempts to understand it.&lt;br /&gt;&lt;br /&gt;The recent breakthrough has been done in mid-August by Neel Nanda who left Anthropic (perhaps the most famous of the &amp;quot;prosaic alignment&amp;quot; start-ups) a few months ago. And it looks like he has more or less solved the mysteries behind this phenomenon. I am going to continue studying his writings more. The links are in the comments.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;img src=&quot;https://www.dreamwidth.org/tools/commentcount?user=dmm&amp;ditemid=64571&quot; width=&quot;30&quot; height=&quot;12&quot; alt=&quot;comment count unavailable&quot; style=&quot;vertical-align: middle;&quot;/&gt; comments</description>
  <comments>https://dmm.dreamwidth.org/64571.html</comments>
  <category>understanding internals of ai</category>
  <category>anthropic ai</category>
  <category>ai safety</category>
  <category>transformers</category>
  <category>machine learning</category>
  <category>artificial intelligence</category>
  <lj:security>public</lj:security>
  <lj:reply-count>5</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>https://dmm.dreamwidth.org/64434.html</guid>
  <pubDate>Wed, 21 Sep 2022 07:25:52 GMT</pubDate>
  <title>Generative autoregressive models are similators</title>
  <link>https://dmm.dreamwidth.org/64434.html</link>
  <description>Вот, наконец, кажется возник правильный подход к пониманию природы моделей вроде GPT-3 и разнообразного волшебства, с этим связанного:&lt;br /&gt;&lt;br /&gt;&lt;a href=&quot;https://www.lesswrong.com/posts/vJFdjigzmcXMhNTsx/simulators&quot;&gt;www.lesswrong.com/posts/vJFdjigzmcXMhNTsx/simulators&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Он говорит, что надо перестать думать про эти модели в терминах более старых AI-систем.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;img src=&quot;https://www.dreamwidth.org/tools/commentcount?user=dmm&amp;ditemid=64434&quot; width=&quot;30&quot; height=&quot;12&quot; alt=&quot;comment count unavailable&quot; style=&quot;vertical-align: middle;&quot;/&gt; comments</description>
  <comments>https://dmm.dreamwidth.org/64434.html</comments>
  <category>transformers</category>
  <category>machine learning</category>
  <category>artificial intelligence</category>
  <category>ai safety</category>
  <category>physics</category>
  <category>technological singularity</category>
  <category>philosophy</category>
  <category>understanding internals of ai</category>
  <lj:security>public</lj:security>
  <lj:reply-count>9</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>https://dmm.dreamwidth.org/63823.html</guid>
  <pubDate>Wed, 07 Sep 2022 15:34:08 GMT</pubDate>
  <title>&quot;Transformers are Sample Efficient World Models&quot;</title>
  <link>https://dmm.dreamwidth.org/63823.html</link>
  <description>Another important paper from one of Fran&amp;ccedil;ois Fleuret&apos;s collaborations: &lt;a href=&quot;https://arxiv.org/abs/2209.00588&quot;&gt;arxiv.org/abs/2209.00588&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Previous important papers include &amp;quot;Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention&amp;quot;,&lt;a href=&quot;https://arxiv.org/abs/2006.16236&quot;&gt;arxiv.org/abs/2006.16236&lt;/a&gt; and &amp;quot;Flatten the Curve: Efficiently Training Low-Curvature Neural Networks&amp;quot;, &lt;a href=&quot;https://arxiv.org/abs/2206.07144&quot;&gt;arxiv.org/abs/2206.07144&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;img src=&quot;https://www.dreamwidth.org/tools/commentcount?user=dmm&amp;ditemid=63823&quot; width=&quot;30&quot; height=&quot;12&quot; alt=&quot;comment count unavailable&quot; style=&quot;vertical-align: middle;&quot;/&gt; comments</description>
  <comments>https://dmm.dreamwidth.org/63823.html</comments>
  <category>machine learning</category>
  <category>artificial intelligence</category>
  <category>transformers</category>
  <category>neural networks</category>
  <lj:security>public</lj:security>
  <lj:reply-count>1</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>https://dmm.dreamwidth.org/62501.html</guid>
  <pubDate>Tue, 09 Aug 2022 03:13:17 GMT</pubDate>
  <title>Open source code generator (an alternative to OpenAI Codex)</title>
  <link>https://dmm.dreamwidth.org/62501.html</link>
  <description>&lt;a href=&quot;https://github.com/salesforce/CodeGen&quot;&gt;github.com/salesforce/CodeGen&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;One can also run one of these models via HuggingFace; it is based on &amp;quot;A Conversational Paradigm for Program Synthesis&amp;quot; paper, &lt;a href=&quot;https://arxiv.org/abs/2203.13474&quot;&gt;arxiv.org/abs/2203.13474&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Someone has even created a fake GitHub Copilot based on that (useful for those who prefer VSCode): &lt;a href=&quot;https://github.com/moyix/fauxpilot&quot;&gt;github.com/moyix/fauxpilot&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;img src=&quot;https://www.dreamwidth.org/tools/commentcount?user=dmm&amp;ditemid=62501&quot; width=&quot;30&quot; height=&quot;12&quot; alt=&quot;comment count unavailable&quot; style=&quot;vertical-align: middle;&quot;/&gt; comments</description>
  <comments>https://dmm.dreamwidth.org/62501.html</comments>
  <category>understanding internals of ai</category>
  <category>openai codex</category>
  <category>transformers</category>
  <category>program synthesis</category>
  <category>github copilot</category>
  <category>machine learning</category>
  <category>artificial intelligence</category>
  <lj:security>public</lj:security>
  <lj:reply-count>6</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>https://dmm.dreamwidth.org/52683.html</guid>
  <pubDate>Fri, 24 Dec 2021 16:27:12 GMT</pubDate>
  <title>&quot;A Mathematical Framework for Transformer Circuits&quot;</title>
  <link>https://dmm.dreamwidth.org/52683.html</link>
  <description>Anthropic AI is an organization which has been created approximately a year ago by former OpenAI people who (I believe) have been unhappy about the current direction of OpenAI.&lt;br /&gt;&lt;br /&gt;&quot;Anthropic is an AI safety and research company that’s working to build reliable, interpretable, and steerable AI systems. Large, general systems of today can have significant benefits, but can also be unpredictable, unreliable, and opaque: our goal is to make progress on these issues.&quot;&lt;br /&gt;&lt;br /&gt;They have just published their first major paper directed towards better understanding of Transformers. I am going to accumulate various links in the comments.&lt;br /&gt;&lt;br /&gt;&lt;img src=&quot;https://www.dreamwidth.org/tools/commentcount?user=dmm&amp;ditemid=52683&quot; width=&quot;30&quot; height=&quot;12&quot; alt=&quot;comment count unavailable&quot; style=&quot;vertical-align: middle;&quot;/&gt; comments</description>
  <comments>https://dmm.dreamwidth.org/52683.html</comments>
  <category>anthropic ai</category>
  <category>understanding internals of ai</category>
  <category>artificial intelligence</category>
  <category>transformers</category>
  <lj:security>public</lj:security>
  <lj:reply-count>3</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>https://dmm.dreamwidth.org/49333.html</guid>
  <pubDate>Wed, 27 Oct 2021 19:03:18 GMT</pubDate>
  <title>&quot;Deep Learning for AI&quot; (the 2018 Turing Lecture)</title>
  <link>https://dmm.dreamwidth.org/49333.html</link>
  <description>It turns out that this is open-access: &lt;a href=&quot;https://dl.acm.org/doi/10.1145/3448250&quot;&gt;dl.acm.org/doi/10.1145/3448250&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;The nuances in that lecture are very interesting, shed various light in the disagreement between Hinton et al and Schmidhuber et al (this one is written from the Hinton et al side, obviously; their emphasis is that technical aspects are equally important and not subservient to &amp;quot;pioneering theory&amp;quot;; e.g. a lot of rather recent pre-2012 developments such as the practical understanding of the role of ReLU is what made the AlexNet breakthrough possible, and moreover things like &amp;quot;the very efficient use of multiple GPUs by Alex Krizhevsky&amp;quot; are also key, not just the neural architecture ideas).&lt;br /&gt;&lt;br /&gt;There is a whole section on Transformers, I am going to include it in the comments verbatim.&lt;br /&gt;&lt;br /&gt;The journal publication is July 2021, and there are references in the paper which are newer than 2018; I don&apos;t know how heavily the text itself has been edited since 2018.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;img src=&quot;https://www.dreamwidth.org/tools/commentcount?user=dmm&amp;ditemid=49333&quot; width=&quot;30&quot; height=&quot;12&quot; alt=&quot;comment count unavailable&quot; style=&quot;vertical-align: middle;&quot;/&gt; comments</description>
  <comments>https://dmm.dreamwidth.org/49333.html</comments>
  <category>transformers</category>
  <category>neural networks</category>
  <category>turing lecture</category>
  <lj:security>public</lj:security>
  <lj:reply-count>4</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>https://dmm.dreamwidth.org/48678.html</guid>
  <pubDate>Sun, 24 Oct 2021 04:43:22 GMT</pubDate>
  <title>Recent presentations</title>
  <link>https://dmm.dreamwidth.org/48678.html</link>
  <description>- JuliaCon 2021 (July 30)&lt;br /&gt;&lt;br /&gt;- ML&amp;nbsp;Collective Research Jam #3 (Aug 4)&lt;br /&gt;&lt;br /&gt;- ML&amp;nbsp;Collective Research Jam #4 (Sep 22)&lt;br /&gt;&lt;br /&gt;- Stuttgart Julia Programming Language Meetup (Oct 23)&lt;br /&gt;&lt;br /&gt;&lt;img src=&quot;https://www.dreamwidth.org/tools/commentcount?user=dmm&amp;ditemid=48678&quot; width=&quot;30&quot; height=&quot;12&quot; alt=&quot;comment count unavailable&quot; style=&quot;vertical-align: middle;&quot;/&gt; comments</description>
  <comments>https://dmm.dreamwidth.org/48678.html</comments>
  <category>julia</category>
  <category>conference</category>
  <category>transformers</category>
  <category>images as matrices</category>
  <category>my talks</category>
  <lj:security>public</lj:security>
  <lj:reply-count>1</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>https://dmm.dreamwidth.org/47832.html</guid>
  <pubDate>Fri, 20 Aug 2021 12:59:24 GMT</pubDate>
  <title>Compact Transformers</title>
  <link>https://dmm.dreamwidth.org/47832.html</link>
  <description>For those of us (like myself) who&apos;d like to experiment with changing Transformer architecture on a home personal computer.&lt;br /&gt;&lt;br /&gt;Links are in the comments.&lt;br /&gt;&lt;br /&gt;&lt;img src=&quot;https://www.dreamwidth.org/tools/commentcount?user=dmm&amp;ditemid=47832&quot; width=&quot;30&quot; height=&quot;12&quot; alt=&quot;comment count unavailable&quot; style=&quot;vertical-align: middle;&quot;/&gt; comments</description>
  <comments>https://dmm.dreamwidth.org/47832.html</comments>
  <category>compact ml models</category>
  <category>machine learning</category>
  <category>transformers</category>
  <lj:security>public</lj:security>
  <lj:reply-count>12</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>https://dmm.dreamwidth.org/47121.html</guid>
  <pubDate>Thu, 12 Aug 2021 15:35:14 GMT</pubDate>
  <title>OpenAI Codex (next generation) - it looks like we are finally &quot;there&quot;</title>
  <link>https://dmm.dreamwidth.org/47121.html</link>
  <description>There was a live demo of the next generation of OpenAI&amp;nbsp;Codex code-generating software on August 10.&lt;br /&gt;&lt;br /&gt;My impression from it is that &amp;quot;we have finally arrived&amp;quot; - this is a programming tool which seems to be more useful than an extra entry-level software engineer on the team. This starts to address the key bottleneck of our times: our limited ability to create software.&lt;br /&gt;&lt;br /&gt;This was always my threshold: can we create an AI software which can be &amp;quot;hired instead of a junior software engineer&amp;quot;? That was the main temporal uncertainty for me: how long would it take to reach that level? It looks like this has been accomplished.&lt;br /&gt;&lt;br /&gt;We are rapidly approaching the situation when AI will actively participate in programming AI software, for better or for worse...&lt;br /&gt;&lt;br /&gt;OpenAI&amp;nbsp;Codex is now a part of OpenAI&amp;nbsp;API&amp;nbsp;(which is still a closed beta with a waitlist), and it will be possible to participate in an informal competition today from 10am Pacific time (1pm Eastern) till 1pm Pacific (4pm Eastern) and try it a bit.&lt;br /&gt;&lt;br /&gt;Links are in the comments.&lt;br /&gt;&lt;br /&gt;&lt;img src=&quot;https://www.dreamwidth.org/tools/commentcount?user=dmm&amp;ditemid=47121&quot; width=&quot;30&quot; height=&quot;12&quot; alt=&quot;comment count unavailable&quot; style=&quot;vertical-align: middle;&quot;/&gt; comments</description>
  <comments>https://dmm.dreamwidth.org/47121.html</comments>
  <category>transformers</category>
  <category>program synthesis</category>
  <category>openai codex</category>
  <lj:security>public</lj:security>
  <lj:reply-count>26</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>https://dmm.dreamwidth.org/44860.html</guid>
  <pubDate>Tue, 29 Jun 2021 16:23:30 GMT</pubDate>
  <title>GitHub Copilot (&quot;we are getting there&quot;)</title>
  <link>https://dmm.dreamwidth.org/44860.html</link>
  <description>&lt;a href=&quot;https://github.blog/2021-06-29-introducing-github-copilot-ai-pair-programmer/&quot;&gt;github.blog/2021-06-29-introducing-github-copilot-ai-pair-programmer/&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&amp;quot;Today, we are launching a technical preview of &lt;a href=&quot;http://copilot.github.com&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;GitHub Copilot&lt;/a&gt;,  a new AI pair programmer that helps you write better code. GitHub  Copilot draws context from the code you&amp;rsquo;re working on, suggesting whole  lines or entire functions. It helps you quickly discover alternative  ways to solve problems, write tests, and explore new APIs without having  to tediously tailor a search for answers on the internet. As you type,  it adapts to the way you write code&amp;mdash;to help you complete your work  faster. &lt;p&gt;Developed in collaboration with OpenAI, GitHub Copilot is powered by  OpenAI Codex, a new AI system created by OpenAI. OpenAI Codex has broad  knowledge of how people use code and is significantly more capable than  GPT-3 in code generation, in part, because it was trained on a data set  that includes a much larger concentration of public source code. GitHub  Copilot works with a broad set of frameworks and languages, but this  technical preview works especially well for Python, JavaScript,  TypeScript, Ruby and Go.&amp;quot;&lt;br /&gt;&lt;br /&gt;If you are using Visual Studio Code often, it might make sense to try to sign-up for the technical preview phase...&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;img src=&quot;https://www.dreamwidth.org/tools/commentcount?user=dmm&amp;ditemid=44860&quot; width=&quot;30&quot; height=&quot;12&quot; alt=&quot;comment count unavailable&quot; style=&quot;vertical-align: middle;&quot;/&gt; comments</description>
  <comments>https://dmm.dreamwidth.org/44860.html</comments>
  <category>program synthesis</category>
  <category>github copilot</category>
  <category>artificial intelligence</category>
  <category>technological singularity</category>
  <category>transformers</category>
  <category>openai codex</category>
  <lj:security>public</lj:security>
  <lj:reply-count>5</lj:reply-count>
</item>
</channel>
</rss>
