The Unsolvable Problem of AI Safety

Introduction

In his latest book, AI: Unexplainable, Unpredictable, Uncontrollable, AI Safety expert Roman Yampolskiy highlights a core issue at the heart of our continual AI development. The problem is not that we don’t know precisely how we’re going to control AI, but that we are yet to prove that it is actually possible to control it.

Yampolskiy, PhD, a tenured professor at the University of Louisville, writes that: “It is a standard practice in computer science to first show that a problem doesn’t belong to a class of unsolvable problems before investing resources into trying to solve it or deciding what approaches to try.” [1] Whether or not AI is ultimately controllable has not yet been proved solvable. And yet, today’s tech giants push ahead with development at breakneck speed all the same. Yampolskiy contends that this lax approach could have existential consequences.

What is AI Safety?

AI Safety is a bit of a catch-all term but can broadly be defined as the attempt to ensure that AI is deployed in ways that do not cause harm to humanity.

The subject has grown in prominence as AI tools have become increasingly sophisticated in recent years, with some of the most nightmarish doom scenarios prophesied by the technology’s naysayers coming to look increasingly plausible.

The need to guardrail against the worst of AI’s possibilities led to the Biden administration’s AI Executive Order in October 2023, the UK’s AI Safety Summit a matter of days later, the EU AI Act, which was approved in March of this year, and the landmark agreement between the UK and US, signed earlier this month, to pool technical knowledge, information and talent on AI safety moving forwards.

The push and pull, as ever, is between how much regulation, if any, we should be putting on AI –– whether we are stifling its potential for innovation by doing so, or simply taking sensible, even vital precautions.

The sudden firing then re-hiring of CEO Sam Altman by the OpenAI board last year was supposedly based on concerns he was neglecting AI Safety in favour of innovation to the point of negligence. This theory is circumstantially backed up by the emergence of Anthropic, a rival AI company set up by the brother and sister duo Dario and Daniela Amodei in 2021, after each of them left executive positions at OpenAI over concerns around the company’s handling of AI Safety.

Meanwhile, Altman, Dario Amodei and Google DeepMind chief executive Demis Hassabis were among the signatories on a one-sentence statement released last year by the Center for AI Safety, a nonprofit organisation [2]. The open letter, signed by more than 350 executives, researchers and engineers working in AI, read simply: “Mitigating the risk of extinction from A.I. should be a global priority alongside other societal-scale risks, such as pandemics and nuclear war.”

The stakes couldn’t be higher.

Unexplainable

A much-vaunted notion is that of ‘explainable AI’, defined by IBM as “a set of processes and methods that allows human users to comprehend and trust the results and output created by machine learning algorithms.” [3]

Put more simply, as the name suggests, after AI performs a task for the user, it will then explain how it did it. Except, as the gap between our intelligence and the ‘superintelligence’ of AI continues to grow, it will soon reach a stage where we simply will not understand how the technology achieved its aims, no matter whether or not it is programmed to tell us. As Albert Einstein said: “It would be possible to describe everything scientifically, but it would make no sense. It would be a description without meaning –– as if you described a Beethoven symphony as a variation of wave pressure.” [4]

Yampolskiy pushes the analogy further, saying: “It is likely easier for a scientist to explain quantum physics to a mentally challenged deaf and mute four-year-old raised by wolves than for superintelligence to explain some of its decisions to the smartest human.” [5]

He notes that it would potentially be possible for AI to only produce decisions that it knows are explainable at our level of understanding, but that doing so would require the AI to knowingly not make the best decision available to it. This, of course, would defeat the point of using such advanced technology in the first place; we are already quite capable of making the wrong decision on our own.

Unpredictable

Given that AI is not explainable, it is in turn necessarily unpredictable –– how can you predict the actions of something you don’t (and can’t) understand? As is already the case with black box AI, the term used to describe AI models that arrive at conclusions or decisions without providing any explanations as to how they were reached, we will be in the dark as to how AI achieved its aims and what it might do to achieve future ones. We may be able to set goals for AI and be accurate in our prediction that it will ultimately achieve them, but the crucial how will be lost, even to the technology’s own programmers.

Yampolskiy comes to the conclusion that the “unpredictability of AI will forever make 100% safe AI an impossibility, but we can still strive for Safer AI because we are able to make some predictions about AIs we design.” [6]

Uncontrollable

AI advocates believe that we will be able to control it. They say that even Artificial General Intelligence (AGI) –– a system that can solve problems without manual intervention, similar to a human being –– will be imbued with our values and as such act in our best interests.

Even Nick Bostrom, a philosopher and professor at Oxford University, whose bestselling book Superintelligence: Paths, Dangers, Strategies showed him to be far from an optimist when it comes to this topic, has commented that, “Since the superintelligence or posthumans that will govern the post-singularity world will be created by us, or might even be us, it seems that we should be in a position to influence what values they will have. What their values are will then determine what the world will look like, since due to their advanced technology they will have a great ability to make the world conform to their values and desires.” [7]

Yampolskiy argues the other side: “As we develop intelligent systems that are less intelligent than we are, we can maintain control, but once such systems become more intelligent than we are, we lose that ability.” [8]

He suggests it is more likely that our values will adjust in accordance with that of the superintelligence than that its values will be shaped and constrained by our own. As the technology reveals itself to be of greater intelligence than any human who has ever lived, it is only rational that humanity will heed to its ideas, as it has done to any number of great thinkers in the past.

The only way to control AI in any real sense, then, would be to put such limitations on it as to constrain its many potential benefits, to the point it ceases to be the revolutionary technology so fervently preached by its advocates. This is the great conundrum, the unsolvable debate: progress with vast, existential risk or safety at the expense of development? As Yampolskiy puts it, “unconstrained intelligence cannot be controlled and constrained intelligence cannot innovate.” [9] It’s one or the other; someone has to decide.

The deciders

“Regulating technology is about safety, but it is also about the kind of civilization we wish to create for ourselves. We can’t leave these big moral questions for AI companies to decide,” writes author of The Digital Republic: Taking Back Control of Technology, Jamie Susskind, in the Financial Times. [10]

And yet it increasingly feels like that’s precisely what we’ve done. We may read about the drama of Sam Altman’s firing and rehiring or of Elon Musk’s recent move to sue OpenAI and Altman himself, but these events play out like soap opera storylines in the headlines. Very few of us actually understand how far this technology has already been pushed, let alone where it’s going.

“The companies that make these things are not rushing to share that data,” says Gary Marcus, professor emeritus of psychology and neural science at New York University, speaking to The Atlantic in December. “And so it becomes this fog of war. We really have no idea what’s going on. And that just can’t be good.” [11]

Eliezer Yudkowsky, a research leader at the Machine Intelligence Research Institute and one of the founding thinkers in the field of AGI, has written that,“if we had 200 years to work on this problem and there was no penalty for failing at it, I would feel very relaxed about humanity’s probability of solving this eventually.” [12] But the precise problem is that the tech giants today are not taking their time. They don’t want safe AI in 200 years if they can have some form of AI today. The only thing that seems to matter is cornering the market. Such short-termism could have devastating consequences.

There is some hope that AI itself could provide the solution. That it might use its superintelligence to find a solution to the problem of how to control it. Though sharing it with humans would be self-defeating in the extreme. Unless superintelligence comes with a heavy streak of masochism baked in, this seems an unlikely scenario.

The unsolvable problem of AI Safety

Yampolskiy writes that “the burden of proof [to demonstrate AI is controllable] is on those who claim that the problem is solvable, and the current absence of such proof speaks loudly about the inherent dangers of the proposal to develop AGI.” [13]

An unexplainable, unpredictable, uncontrollable AI superintelligence will drastically re-shape the world order, perhaps even overhauling it. AI Safety is needed to stop it. While recent measures are plenty, none address the problem of the AI control problem. Meanwhile, in Silicon Valley, development continues at a pace. It is easy to write off AI critics as prophets of doom or enemies of progress, but to proceed without proper safety provisions in place is to open a door we may not be able to close. As Yampolskiy surmises, “the chances of a misaligned AI are not small. In fact, in the absence of an effective safety program, that is the only outcome we will get.” [14]

More on AI

Combatting Cybersecurity Risks

The EU AI Act: What you Need to Know

The Ethical Minefield of Artificial Intelligence

AI and the Future of Work

Sources

[1] Yampolskiy, R. V. (2024). AI: Unexplainable, Unpredictable, Uncontrollable. Taylor & Francis Ltd.

[2] https://www.safe.ai/work/statement-on-ai-risk

[3] https://www.ibm.com/topics/explainable-ai

[4] https://www.raabcollection.com/literary-autographs/einstein-god#:~:text=It%20would%20be%20a%20description,religious%20views%20on%20many%20occasions.

[5] Yampolskiy, R. V. (2024). AI: Unexplainable, Unpredictable, Uncontrollable. Taylor & Francis Ltd.

[6] Yampolskiy, R. V. (2024). AI: Unexplainable, Unpredictable, Uncontrollable. Taylor & Francis Ltd.

[7] https://mason.gmu.edu/~rhanson/vc.html

[8] Yampolskiy, R. V. (2024). AI: Unexplainable, Unpredictable, Uncontrollable. Taylor & Francis Ltd.

[9] Yampolskiy, R. V. (2024). AI: Unexplainable, Unpredictable, Uncontrollable. Taylor & Francis Ltd.

[10] https://www.ft.com/content/b259b126-225b-4158-90a0-abebfd0119fc

[11] https://www.theatlantic.com/newsletters/archive/2023/12/ai-tech-instability-gary-marcus/676286/

[12] Yampolskiy, R. V. (2024). AI: Unexplainable, Unpredictable, Uncontrollable. Taylor & Francis Ltd.

[13] Yampolskiy, R. V. (2024). AI: Unexplainable, Unpredictable, Uncontrollable. Taylor & Francis Ltd.

[14] Yampolskiy, R. V. (2024). AI: Unexplainable, Unpredictable, Uncontrollable. Taylor & Francis Ltd.