9 What the First Thinking Machine Thought
What did the first-ever thinking machine think about? Once, this question would have been easy to answer. I’d have said: the first-ever thinking machine was called the Logic Theorist, and it tried to prove theorems in Whitehead and Russell’s Principia Mathematica.
Now, the question is more difficult to answer because, since the mid-1950s, we’ve expanded the term thinking to cover a much wider range of cognitive behavior.
As I noted in Chapter 2, scientists have been studying cognitive behavior—and calling it intelligence—in nearly any animal (and even plant) imaginable, from whales to amoebas, from octopuses to trees. Biologists are easy with the idea that cognition is ubiquitous, from the cell on up, from the brain on down. This inclusive sense of intelligence was implicit in that first book of readings in artificial intelligence, Computers and Thought, where articles about pattern recognition by machine were side by side with articles about chess-playing programs. Computer scientists now call the grand realm of intelligence in brains, minds, and machines computational rationality.
But in the mid-1950s of the Logic Theorist, thinking meant only symbolic thinking, the kind of planning, imagining, recollection, and symbol creation that humans alone exhibited. John McCarthy used to tease: maybe a thermostat can be said to think. Was he nuts? No, the proposition was meant to force us to define the specific differences between what humans and a thermostat do. As time has passed, the dividing line is no longer so distinct, and decades of research have shown us that thinking is far more complex than we dreamed.
So let me refine my opening question: what did the first human-like thinking machine think about? It was called the Logic Theorist, LT, and it tried to prove theorems in Whitehead and Russell’s Principia Mathematica. Although its subject was logic, the program was squarely in the category of “thinking humanly,” as distinct from “thinking logically.” Allen Newell and Herbert Simon were practicing cognitive psychologists and wanted to model the ways humans proved theorems, not create a killer machine that would out-think humans—although each of them conceded that inevitable outcome. They were well aware of other aspects of intelligence, but they aimed to begin by modeling some parts of the highest level of human thinking, the symbolic processes that have begotten culture and civilization.
Given the primitive tools of the time, scientific knowledge about human thinking was scant. Newell and Simon’s approach was not stimulus-response, associative memory, or any of the other mid-20th century guesses cognitive psychologists had made about how human thinking worked. LT was a model of human symbolic thinking. It was dynamic, nonmathematical—symbolic—and changed over time.
LT was a first step for Newell and Simon in their ambition to understand the human mind. It would eventually lead to more abstract ideas about intelligence in general.
Details about the LT program can be found elsewhere, but its outstanding characteristics were first that it could learn. Helped along by some heuristics, rules of thumb that the programmers had taught it, the program didn’t search every possible path to prove a theorem, but instead considered only likely paths to a proof. As it pursued those paths and met new theorems, it acquired new knowledge, which it stored and then used to solve other problems.
Next, LT could recombine knowledge it already had to create something entirely new. It could quickly widen its search for answers well beyond human capacities for search, which is how it came to discover a shorter and more satisfying proof to Theorem 2.85 than Whitehead and Russell had used. Simon wrote this news to Bertrand Russell, who responded with good humor.
Although LT learned, created new knowledge, found new answers to problems, and knew when it did so, all that was in a limited domain. But this capacity for combinatorial search would serve AI well in the future: When Deep Blue defeated Garry Kasparov, a gasp went up from the human audience because Deep Blue had found a move that had never before been seen in human play. When AlphaGo defeated two Go champions in succession (each one claiming to be the best in the world), it did so by finding a Go move that no one had ever seen before. In combinatorial search lay one aspect of machine creativity.
LT offered no physiological theory of how humans think; it wasn’t meant to. But it showed that in the narrow task of proving theorems in logic, human performance could be simulated on a computer in ways that satisfied what cognitive psychologists knew about how the human mind worked. Simon believed that ultimately, a physiological theory of human thinking would be needed. (We still await that in deep detail and keep being surprised by what we do learn.) But instead of researchers trying to jump from the complexities we see in human behavior right down to neuron level, LT represented an intermediate level, one that could obviously be mechanized—Newell and Simon had done it. They called it the information processing level.
Today we might say that LT is the first computer model of what psychologist Daniel Kahneman (2011) calls slow thinking, System 2—slow, deliberative, analytical. LT isn’t a model of the other kind of thinking, System 1—instinctual, impulsive, sometimes emotional. “Welcome to analytic thinking,” Ed Feigenbaum said, after I’d confessed that being around computer scientists was changing how I thought. I didn’t then understand that like every other normal human being, I’d think both ways for the rest of my life. At that point in the history of cognitive psychology, the notion of two ways (or more) of thinking was pretty much unknown. Most people believed in an eternal bifurcation: There was thinking. And there was not.
Feigenbaum would later argue that LT’s great advantage was combinatorial search—guided by its rules of thumb, it could search a bigger space and find solutions faster than even humans as smart as Alfred North Whitehead and Bertrand Russell. Thus LT exhibited a vital characteristic, sometimes a curse, of AI: in its much greater capacity to search, even guided by rules, we cannot imagine everything that AI will find to do. It will go places we cannot imagine or foresee, with unimaginable and unforeseeable results.
In short, AI will always produce unintended consequences. Similarly, as Amelia Earhart pioneered a round-the-world flight in 1937, she might have imagined a global network of commercial flights that would eventually come to pass. But could she foresee that this network would contribute significantly to global warming?
This is a vital truth worth repeating: humans cannot imagine everything possible in a search space. Nor can machines. But a machine can go much further and faster, often with unexpected results.
Another of the deep lessons Simon tried to teach me is also embodied in LT. In scientific modeling, levels of abstraction exist, and the study of each level is useful in itself. Everything material might, at bottom, be physics, but studying chemistry and biology, two higher-level organizations of matter, is still useful. Much later, I’d re-encounter the same idea of levels of abstraction in the work at the Santa Fe Institute, specifically its studies of complex adaptive systems, which begin simply, adapt to the environment dynamically, and end up being complex.
Did LT, with its ability to solve problems and learn from those solutions, take the world by storm? Hardly.
“I guess I thought it was more earth-shaking than most people did,” Simon laughed. Then he got more serious. “I was surprised by how few people realized that they were living in a different world now. But that’s a myopic, an egocentric view of it, the inventor’s evaluation.” He was still surprised that, even when we were speaking in 1975, nearly twenty years after LT had debuted, so many didn’t realize how the world had changed with this understanding of what you could do with a computer. “There are still well-educated people who argue seriously about whether computers can think. That indicates they haven’t absorbed the lesson yet.”
Part of the problem was that most people’s exposure to computing then was numerical. Computers might be able to count, but could they deal with other kinds of symbols? As late as 2013, I heard a Harvard professor (in the humanities, true) declare that computers “could only handle numbers.” Didn’t he use email? In some literal, simplistic sense, he was correct—zeros and ones—but computers make a distinction, in George Dyson’s elegant phrase, between numbers that mean things and numbers that do things, and computer systems are hierarchically arranged from the simple level of zeros and ones to a level that can imitate aspects of human thought. After all, Beethoven produced sublime music with a mind whose foundation was on-off nerve cells. All symbols are created and have their being in a physical system. “A physical symbol system has the necessary and sufficient means for general intelligent action,” Newell and Simon wrote (1976). Finally, by inventing a computer program that could think non-numerically, Newell and Simon declared they’d solved the mind-body problem. Or rather, it had simply gone away.
Humans and computers were two instances of physical systems that could manipulate symbols, and therefore exhibit some qualities of mind. LT was an example of a system made of matter that exhibited properties of mind.
What, then, is mind? It’s a physical system that can store the contents of memory in symbols, Newell and Simon declared. Symbols are objects that have access to meaning—designations, denotations, information a symbol might have about a concept, such as a pen, brotherhood, or quality. The physical symbol system, whether brain or computer, can act upon those symbols appropriately. We’ve subsequently learned that, as we think, we process not just memory, but also internal and external information from the environment.
The physical symbol system, simply stated but profound, would undergird AI for decades to come. It would seep into biology as a way of explaining how biological systems functioned. It would come to be seen as an essential condition for intelligent action of any generality, always physically embodied. Some in AI research would break away from this scheme, believing that fast reactions to the environment in real time are more important than a fancy internal representation—a mind—but that was much later.
Understanding, Simon argued, is a relation among three elements: a system, one or more bodies of knowledge, and a set of tasks the system is expected to perform. It follows that consciousness is an information processing system that stores some of the contents of its short-term memory at a particular time, aware not only of some things external to it, but of some internal things too, which it can report on. “That’s a small but fairly important subset of what’s going on in mind,” Simon added to me.
Simon and Newell weren’t claiming to explain or simulate all of thinking—only “a small but fairly important subset.” This befuddled the bifurcationists. What was a subset of thinking? For them, an entity was thinking or it wasn’t. To abstract only certain aspects of cognition and simulate them on a computer didn’t make sense to them. Yet any humanist understood synechdoche, where a part of something stands for the whole of something: “Give me a hand.” “Boots on the ground.” A synechdoche isn’t exactly the same as an abstraction of some aspect of intelligence, but such a comparison might have opened a path for outsiders to push beyond all-or-nothing dogmas about thinking and to begin to understand AI.
The bifucationists weren’t the only ones who didn’t like this approach. Among dissenters were neural net scientists, who hoped to build intelligence in machines from the neuron up. Simon was fine with this. “We’re going from A to B. They’re going from Z to Y. Our way suits us better, because Allen and I have come out of the behavioral sciences, economics, and operations research, and we know those fields haven’t been able to reduce much of human behavior to formulation.”
This would change over the decades.
The contents of Computers and Thought, that ur-textbook of AI, were divided into two major parts. One was the simulation of human cognition, which contained papers that simulated human problem-solving behavior, verbal learning, concept formation, decision-making under uncertainty, and social behavior. The other major part was artificial intelligence, which, without reference to human behavior, contained papers about programs that recognized visual patterns, that proved mathematical theorems (logic and geometry), and that played games (chess and checkers), and that could manage some early understanding of natural language. If this now seems a bit muddled—we know human cognition uses pattern recognition and many other seemingly mechanical tricks—it represented the provisional understanding then of what was thinking and what wasn’t.
In the 1960s and 1970s, nearly all these efforts would break away into independent fields, with their own social structures, journals, specialized meetings, and peer-review groups. Robotics didn’t talk to machine learning, and natural language processing didn’t talk to constraint analysis. This approach made complete sense socially even though it was nonsense scientifically: intelligent behavior requires an ensemble of skills. The divisions remain as I write, but certain groups are beginning to explore what they can learn from each other and how combining certain subfields can accelerate cognitive computing.
After Simon’s thinking machine announcement, the winter and spring of 1956 were deeply productive. Did he have a sense they were doing something momentous, especially since he seemed to have kept every document possible from that time? His files were crammed with notes of possible paths to pursue, ideas to expand, and only time kept him from doing it. He laughed with glee. “Oh, yeah! Oh, yeah! Yep! It seemed obvious.”
As their ideas about a scientific definition of intelligence became clearer and more nuanced, intelligence came to describe a reciprocal relationship between an individual and the surrounding culture, a culture built over many generations. (Remember that walk under the eucalyptus trees at Mills College where that insight was given to me?)
Real-world problems are deeply complex, resources of time and knowledge are limited, and the best way to reach a goal requires identifying ideal actions and an ability to approximate those ideal actions, a kind of built-in statistical procedure humans use to allocate time and other resources. “Sometimes intelligence comes most in knowing how to best allocate these scarce resources,” Samuel Gershman and his colleagues write (Gershman et al., 2015). As intelligent agents, tradeoffs are forced upon us, and we better be good about figuring them out.
Newell and Simon saw that humans used heuristics to identify an ideal action, approximate it, allocate resources, and evaluate tradeoffs. Although these informal rules of thumb didn’t work every time, they cut the search space to reach a goal so that humans can cope with the world in reasonable time frames. Newell and Simon’s AI programs and those of their followers relied on heuristics. But in the 1980s, more formal statistical methods would largely replace heuristics in AI, which, together with other methods (and dramatically improved technology) would take AI out of its cradle and into a lusty infancy.
That summer of 1956, Newell and Simon were invited—as an afterthought—to what was to become, over the years, a legendary summer conference at Dartmouth. Its organizers were two other young scientists, John McCarthy, on the mathematics faculty at Dartmouth, and Marvin Minsky, at Harvard. The conference was under the aegis of Claude Shannon, the father of information theory, then at Bell Labs, and it meant to explore the topic of machines that could think at human levels of intelligence—artificial intelligence.
All the invitees had ideas about how AI might be achieved—physiology, formal logic—but Newell and Simon arrived with the Logic Theorist, an actual working program.
“Allen and I didn’t like the name artificial intelligence at all,” Simon said later. “We thought of a long list of terms to describe what we were doing and settled on complex information processing.” Which went nowhere. No wonder. All poetry is sacrificed for dreary precision. Instead, artificial intelligence stuck. I’m partial to it myself.
In September 1956, just after the summer Dartmouth conference, the Institute for Electrical and Electronics Engineers (IEEE) had a larger meeting in Cambridge, Massachusetts. Newell and Simon would be sharing a platform with a small group of others who’d come to Dartmouth, including Minsky and McCarthy. That arrangement made it seem as if they were equivalent, when in fact only Newell and Simon had a produced a working program; the others were still at the idea stage.
John McCarthy thought that he’d report to the IEEE meeting about the just-concluded conference and describe Newell and Simon’s work. The two Pittsburghers objected strenuously: they’d report their own work, thanks. Simon remembered some tough negotiations with Walter Rosenblith, chair of the session, who walked Newell and Simon around the MIT campus for an hour or more just before the meeting. They finally agreed that McCarthy would give a general presentation of the Dartmouth Conference work, and then Newell would talk about his and Simon’s work in particular. Newell and Simon were first. They wanted—and deserved—the credit.
The two successful scientists would race ahead, applying their techniques to a more ambitious program, the General Problem Solver, which they hoped would solve problems in general. GPS did indeed codify a number of problem-solving techniques that humans regularly employ. But this emphasis on resoning would mislead AI researchers. Reasoning was necessary to intelligent behavior, but hardly sufficient.
- But The Journal of Symbolic Logic declined to publish any article coauthored by a computer program. Moreover, some logicians misunderstood that, as cognitive psychologists, Newell and Simon were eager to simulate human thought processes. These logicians created a faster theorem-proving machine and triumphantly dismissed the Logic Theorist as primitive. So we are; so we humans are. ↵
- Joe was in the human audience of this match, having been a gifted chess player in his youth, and told me he thought he was the only one rooting for the machine to win. “But this great program is a human accomplishment,” he argued. ↵
- This barebones description of understanding led to philosopher John Searle’s attack on AI by means of the Chinese Room Argument, which I have more to say about later. ↵
- Nomenclature has vexed the field from time to time. These days, you’ll hear terms like machine learning, cognitive computing, smart software, and computational intelligence used to refer to computers doing something that, in Ed Feigenbaum’s old formulation, we’d consider intelligent behavior if humans did it. Sometimes the new phrase has arisen to dissociate from the largely media-produced reputation AI earned as nothing but failed promises sometime in the 1980s, a time when people began to talk about the “AI winter.” Usually, the scientists were careful (I think of John McCarthy’s caution: “Artificial intelligence might arrive in four or four hundred years”). But not Herbert Simon. In my book Machines Who Think (1979), he explains his reasoning about the four predictions he made in 1958 that didn’t soon come to pass. Later I came across his 1965 prediction that in twenty years, any work humans could do now would then be done by machine. No, not twenty years later; not fifty years later. Journalists and other eager promoters, such as the sellers of IPOs, also overreached. But sometimes the abundant nomenclature also reflects the fissuring of fields into subfields. In the second decade of the 21st century, the term artificial intelligence seems to have regained respectability. ↵