dmm | AI safety and transparency

I was reading quite of bit of (new to me) texts on AI safety in recent weeks.

I noticed a couple of things: there are new very strong young people in the field; also there is quite a bit of technical progress.

8 years ago the field looked rather hopeless (it mostly consisted of disagreements, and it did not look like there were any routes to technical progress in the field, it was all just talk). So changes to the better are impressive.

One particularly important motive is work towards better understanding and better transparency and interpretability of neural-like models.

I'll link to the one paper which seemed most interesting and eloquent in this sense, and eventually I'll add more material in the comments.

Evan Hubinger (Nov 2019), "Chris Olah’s views on AGI safety": www.alignmentforum.org/posts/X2i9dQQK3gETCyqh2/chris-olah-s-views-on-agi-safety

Flat | Top-Level Comments Only

So, interestingly enough:

on one hand, people are worried that "safe" approaches would be less competitive compared to mainstream, and so we'll have dominance of unsafe AGI;

on another hand, somehow they are worries that strong successes in transparency would lead to major capability increase with the take-off being too hard.

So, it seems that one would like to find a sweet spot: a capability increase sufficient to dominate less safe methods, but not so big as to lead to a very hard take-off...

I'd like to note this text by Vanessa Kosoy: "Needed: AI infohazard policy" (Sept.2020, with discussion: 17 comments)

https://www.lesswrong.com/posts/3D3DsX5rMbk3jEZ5h/needed-ai-infohazard-policy

AI safety and transparency

no subject

no subject