Picture a computer that could finish your sentences, using a better turn of phrase; or use a snatch of melody to compose music that sounds as if you wrote it (though you never would have); or solve a problem by creating hundreds of lines of computer code—leaving you to focus on something even harder. In a sense, that computer is merely the descendant of the power looms and steam engines that hastened the Industrial Revolution. But it also belongs to a new class of machine, because it grasps the symbols in language, music and programming and uses them in ways that seem creative. A bit like a human.
The “foundation models” that can do these things represent a breakthrough in artificial intelligence, or ai. They, too, promise a revolution, but this one will affect the high-status brainwork that the Industrial Revolution never touched. There are no guarantees about what lies ahead—after all, ai has stumbled in the past. But it is time to look at the promise and perils of the next big thing in machine intelligence.
Foundation models are the latest twist on “deep learning” (dl), a technique that rose to prominence ten years ago and now dominates the field of ai. Loosely based on the networked structure of neurons in the human brain, dl systems are “trained” using millions or billions of examples of texts, images or sound clips. In recent years the ballooning cost, in time and money, of training ever-larger dl systems had prompted worries that the technique was reaching its limits. Some fretted about an “ai winter”. But foundation models show that building ever-larger and more complex dl does indeed continue to unlock ever more impressive new capabilities. Nobody knows where the limit lies.
Foundation models have some surprising and useful properties. The eeriest of these is their “emergent” behaviour—that is, skills (such as the ability to get a joke or match a situation and a proverb) which arise from the size and depth of the models, rather than being the result of deliberate design. Just as a rapid succession of still photographs gives the sensation of movement, so trillions of binary computational decisions fuse into a simulacrum of fluid human comprehension and creativity that, whatever the philosophers may say, looks a lot like the real thing. Even the creators of these systems are surprised at their power.
This intelligence is broad and adaptable. True, foundation models are capable of behaving like an idiot, but then humans are, too. If you ask one who won the Nobel prize for physics in 1625, it may suggest Galileo, Bacon or Kepler, not understanding that the first prize was awarded in 1901. However, they are also adaptable in ways that earlier ais were not, perhaps because at some level there is a similarity between the rules for manipulating symbols in disciplines as different as drawing, creative writing and computer programming. This breadth means that foundation models could be used in lots of applications, from helping find new drugs using predictions about how proteins fold in three dimensions, to selecting interesting charts from datasets and dealing with open-ended questions by trawling huge databases to formulate answers that open up new areas of inquiry……..