Discover more from Apart Research
Why AI might not be an existential risk to humanity W42
This week, we’re looking at counterarguments to the basic case for why AI is an existential risk to humanity, looking at how strong AI might come very soon, and sharing interesting papers.
Today is October 20th and this is the ML Safety Progress Update!
AI X-risk counterarguments
Existential risk of AI does not seem overwhelmingly likely according to Katja Grace from AI Impacts. She writes a long article arguing against the major perspectives on how AI can become very dangerous and notes that enough uncertainty makes AI safety seem like a relevant concern.
Her counterarguments go against the three main cases for why superintelligent AI will become an existential risk: 1) Superhuman AI systems will be goal-directed, 2) goal-directed AI systems’ goals will be bad, and 3) superhuman AI will overpower humans.
Her counterarguments for why AI systems might not be goal-directed are that many highly functional systems can be “pseudo-agents”, models that don’t pursue utility maximization but optimize for a range of sub-goals to be met. Additionally, to be a risk, the bar for goal-directedness is extremely high.
Her arguments for why goal-directed AI systems’ goals might not be bad are that: 1) Even evil humans broadly correspond to human values and that slight diversity from the optimal policy seem alright. 2) AI might just learn the correct thing from the dataset since humans also seem to get their behavior from the diverse training data of the world. 3) Deep learning seems very good at learning fuzzy things from data and values seem learnable in slightly the same way as generating faces (and we don’t see faces without noses for example). The last counterargument is that 4) AIs who learn short-term goals will both be highly functional and have a low chance of optimizing for dangerous, long-term goals such as power-seeking.
Superhuman AI might also not overpower humans since: 1) A genius human in the stone age would have a much harder time getting to space than an average intelligence human today which shows that intelligence is a much more nuanced concept than we set it to be. 2) AI might not be better than human-AI combinations. 3) AI will need our trust to take over critical infrastructure. 4) There are many other properties than intelligence which seem highly relevant. 5) Many goals do not end in taking over the universe. 6) Intelligence feedback loops can take many speeds and you need a lot of confidence that it is fast to say it leads to doom. And 7) key concepts in the literature are quite vague, meaning that we lack an understanding of how they will lead to existential risk.
Erik Jenner and Johannes Treutlein give their response to her counterarguments. Their main point is that there’s good evidence that the difference between AI and humans will be large and that we need Grace’s slightly aligned AI to help us reach a state where we do not build much more capable and more misaligned systems.
Comprehensive AI Services (CAIS)
A relevant text to mention in relation to these arguments is Eric Drexler’s attempt at reframing superintelligence into something more realistic in an economic world. Here, he uses the term “AI services” to describe singular tasks that will be economically relevant. The comprehensive in comprehensive AI services is what we usually call general. The main point is that we will see a lot of highly capable but specialized AI before we get the monolithic artificial general intelligence. We recommend reading the report if you have the time.
Strong AGI coming soon
At the opposite end of the spectrum from Grace, Porby shares why they think AGI will arrive in the next 20 years with convincing arguments on 1) how easy the problem of intelligence is, 2) how immature current machine learning is, 3) how quickly we’ll reach the level of hardware needed, and 4) how we cannot look at current AI systems to predict future abilities.
In other news, in a new survey published in Nature, non-expert users of AI systems think interpretability is important, especially in safety-critical scenarios. However, they prefer accuracy in most tasks.
Neel Nanda shares an opinionated reading of his favorite Circuits interpretability work.
A new method in reinforcement learning shows good results on both performance and how moral its actions are. They take a text-based game and train a reinforcement learning agent with both a task policy and a moral policy.
Wentworth notes how prediction markets might be useful for alignment research.
DeepMind has given a language model access to a physics simulation to increase its physics reasoning ability.
Nate Soares describes how superintelligent beings do not necessarily leave humans alive on game theoretic grounds.
A new research agenda in AI safety seeks to study the theory of deep learning using a pragmatic approach to understand key concepts.
And now, diving into the many opportunities available for all interested in learning and doing more ML safety research!
SERI MATS are accepting applications for a fully paid 2 month in-person fellowship to do independent research in AI safety. Apply now because the applications close this Sunday.
The Future of Life Institute is accepting applications to fund your PhD or postdoc in an AI safety-relevant field.
We have released our new website for the Alignment Jam hackathons that we’re proud to show the world. Just go to alignmentjam.com, join the next hackathon in November, and subscribe to receive updates.
This has been the ML Safety Progress Update and we look forward to seeing you next week!