dmm | New Year resolution

You're viewing

dmm's journal
Create a Dreamwidth Account Learn More

Reload page in style: site light

To read my https://twitter.com/home more regularly (that's absolutely the best source of info at the moment).

A small fraction of today's catch:

New work by Janus

A new involved take on AI safety/alignment

(What's the right way to organize all that information?)

Links are in the comments (I think the new work by Janus is more important even for alignment, and is just overall more important of the two topics of this post)...

Flat | Top-Level Comments Only

From:

dmm

> New work by Janus

https://www.lesswrong.com/posts/nmMorGE4MS4txzr8q/simulators-seminar-sequence-1-background-and-shared

https://www.lesswrong.com/posts/TTn6vTcZ3szBctvgb/simulators-seminar-sequence-2-semiotic-physics

"Meta: Over the past few months, we've held a seminar series on the "Simulators theory" by janus. As the theory is actively under development, the purpose of the series is to discover central structures and open problems. Our aim with this sequence is to share some of our discussions with a broader audience and to encourage new research on the questions we uncover. Below, we outline the broader rationale and shared assumptions of the participants of the seminar."

From:

dmm

I am now reading Janus twitter:

https://twitter.com/ComputingByArts/status/1611958021906731009

https://twitter.com/repligate/status/1611934083780673538

"Natural language is unfathomably versatile in what it can specify and inspire, and it's been long waiting for a more literate entity than mankind to come into its full power as a programming language ;)"

My October write-up on the "Simulators theory": https://github.com/anhinga/2022-notes/tree/main/Generative-autoregressive-models-are-similators

(I used to tentatively call it "Janus paradigm" or "Simulators paradigm".)

From:

dmm

This new "Janus sequence" is focusing on alignment a lot (good!)

From:

dmm

"Conditional on having a foundational understanding of simulators (supported by theorems and empirical results), we hope to be able to construct a simulation (or a set of simulations) that reliably produces useful alignment research."

From:

dmm

Janus in the comments to the first post:

"It's like this: magic exists now. The amount of magic in the world is increasing, allowing for increasingly powerful spells and artifacts, such as CLONE MIND. This is concerning for obvious reasons. One would hope that the protagonists, whose goal it is to steer this autocatalyzing explosion of psychic energy through the needle of an eye to utopia, will become competent at magic."

From:

dmm

https://www.lesswrong.com/posts/TTn6vTcZ3szBctvgb/simulators-seminar-sequence-2-semiotic-physics

"The term “semiotic physics” here refers to the study of the fundamental forces and laws that govern the behavior of signs and symbols. Similar to how the study of physics helps us understand and make use of the laws that govern the physical universe, semiotic physics studies the fundamental forces that govern the symbolic universe of GPT, a universe that reflects and intersects with the universe of our own cognition. We transfer concepts from dynamical systems theory, such as attractors and basins of attraction, to the semiotic universe and spell out examples and implications of the proposed perspective."

From:

dmm

On semantics on natural language, Footnote 12:

"My (Jan's) take is that the central confusion arises because people are confused about neuroscience. The sentence "The current king of France is bald." does not refer to a king of France in the physical universe; it refers to a certain pattern of neural activations in someone's cortex. That pattern is a part of the physical universe (and thus fits into the framework of Russell et al), but it's not "simple" in the way that the early philosophers of language would have liked it to be."

From:

dmm

(But a more usual ad hoc "vector semantics" is used instead by the authors.)

From:

dmm

https://generative.ink/prophecies/

From:

dmm

Footnote 23:

"Both dramatic tension and tragedy are powerful forces in the semiotic universe, and they can work against our attempts to control the behavior of the language model. For example, if we introduce a prompt that describes a group of brilliant and determined alignment researchers, we might want the language model to generate a continuation that includes a working solution to the alignment problem. However, the principles of dramatic tension and tragedy might guide the language model towards generating a continuation that includes an overlooked flaw in the proposed solution which leads to the instantiation of a misaligned superintelligence.

Thus, we need to be aware of the various forces and constraints that govern the semiotic universe, and use them to our advantage when we are trying to control the behavior of the language model. A deep understanding of how these stylistic devices are commonly used in human-generated text and how they can be influenced by various forms of training will be necessary to control and leverage the laws of semiotic physics."