How generative models could go wrong

May 3, 2023

The Economist

A big problem is that they are black boxes.

In 1960 Norbert Weiner published a prescient essay. In it, the father of cybernetics worried about a world in which “machines learn” and “develop unforeseen strategies at rates that baffle their programmers.” Such strategies, he thought, might involve actions that those programmers did not “really desire” and were instead “merely colourful imitation[s] of it.” Wiener illustrated his point with the German poet Goethe’s fable, “The Sorcerer’s Apprentice”, in which a trainee magician enchants a broom to fetch water to fill his master’s bath. But the trainee is unable to stop the broom when its task is complete. It eventually brings so much water that it floods the room, having lacked the common sense to know when to stop.

The striking progress of modern artificial-intelligence (ai) research has seen Wiener’s fears resurface. In August 2022, aiImpacts, an American research group, published a survey that asked more than 700 machine-learning researchers about their predictions for both progress in ai and the risks the technology might pose. The typical respondent reckoned there was a 5% probability of advanced ai causing an “extremely bad” outcome, such as human extinction (see chart). Fei-Fei Li, an AI luminary at Stanford University, talks of a “civilizational moment” for ai. Asked by an American tv network if ai could wipe out humanity, Geoff Hinton of the University of Toronto, another AI bigwig, replied that it was “not inconceivable”.

There is no shortage of risks to preoccupy people. At the moment, much concern is focused on “large language models” (llms) such as Chatgpt, a chatbot developed by Openai, a startup. Such models, trained on enormous piles of text scraped from the internet, can produce human-quality writing and chat knowledgeably about all kinds of topics. As Robert Trager of the Centre for Governance on AI explains, one risk is of such software “making it easier to do lots of things—and thus allowing more people to do them.”

The most immediate risk is that llms could amplify the sort of quotidian harms that can be perpetrated on the internet today. A text-generation engine that can convincingly imitate a variety of styles is ideal for spreading misinformation, scamming people out of their money or convincing employees to click on dodgy links in emails, infecting their company’s computers with malware. Chatbots have also been used to cheat at school.

Like souped-up search engines, chatbots can also help humans fetch and understand information. That can be a double-edged sword. In April, a Pakistani court used gpt-4 to help make a decision on granting bail—it even included a transcript of a conversation with gpt-4 in its judgment. In a preprint published on arXiv on April 11th, researchers from Carnegie Mellon University say they designed a system that, given simple prompts such as “synthesize ibuprofen”, searches the internet and spits out instructions on how to produce the painkiller from precursor chemicals. But there is no reason that such a program would be limited to beneficial drugs.

Some researchers, meanwhile, are consumed by much bigger worries. They fret about “alignment problems”, the technical name for the concern raised by Wiener in his essay. The risk here is that, like Goethe’s enchanted broom, an ai might single-mindedly pursue a goal set by a user, but in the process do something harmful that was not desired. The best-known example is the “paperclip maximizer”, a thought experiment described by Nick Bostrom, a philosopher, in 2003. An ai is instructed to manufacture as many paper clips as it can. Being an idiot savant, such an open-ended goal leads the maximizer to take any measures necessary to cover the Earth in paperclip factories, exterminating humanity along the way. Such a scenario may sound like an unused plotline from a Douglas Adams novel. But, as ai Impacts’ poll shows, many ai researchers think that not to worry about the behavior of a digital superintelligence would be complacent.

What to do? The more familiar problems seem the most tractable. Before releasing gpt-4, which powers the latest version of its chatbot, Openaiused several approaches to reduce the risk of accidents and misuse. One is called “reinforcement learning from human feedback” (rlhf). Described in a paper published in 2017, rlhf asks humans to provide feedback on whether a model’s response to a prompt was appropriate. The model is then updated based on that feedback. The goal is to reduce the likelihood of producing harmful content when given similar prompts in the future. One obvious drawback of this method is that humans themselves often disagree about what counts as “appropriate”. An irony, says one ai researcher, is that rlhf also made Chatgpt far more capable in conversation, and therefore helped propel the ai race.

Another approach, borrowed from war-gaming, is called “red-teaming”. Openai worked with the Alignment Research Centre (arc), a non-profit, to put its model through a battery of tests. The red-teamer’s job was to “attack” the model by getting it to do something it should not, in the hope of anticipating mischief in the real world…..

Stable Genus

Stable Genus

How generative models could go wrong

About us

SIGN UP FOR Newsletter