14 Raj Reddy and the Dawn of Machine Learning
Raj Reddy had been a gaunt, large-eyed graduate student when I’d met him at the Stanford Artificial Intelligence Lab in the mid-1960s. He was soon to be one of the two first Stanford PhDs in computer science, another member of the second generation of AI researchers. By the time Joe and I went to Carnegie Mellon in 1971, Reddy was on the computer science faculty and was beginning to look less famished, if not yet fighting weight. He’d always had an easy, wide smile, and now it animated a fuller, quite handsome café au lait face, bright eyes, and a small moustache. Marriage to Anu had clearly been good for him, and they were the happy parents of two little girls. They wanted to raise them as much Indian as American, traveling to India every year during the long summer holidays, but Reddy would sigh to me a few years later, “I know I won’t be arranging their marriages. They’re American. They’ll marry who they want.”
At Carnegie Mellon, Reddy was leading a project to construct a computer program that could understand continuous human speech. The difficulties were enormous. Whereas Roman letter text conveniently puts spaces between words, periods at the end of sentences, and indentations to signal a new paragraph, speakers do not. Moreover, written words don’t have distortions from intonation, hesitation, or background noise that spoken words do. “But it never occurred to me,” he once said with a brilliant smile, “that it couldn’t be done. That may be exactly what’s needed for anybody who wants to go into this field, namely, blind optimism with no reasonable basis for it.”
Reddy was born in 1937 in a rural Indian village—Katoor, Andhra Pradesh—unchanged for centuries. When his father went to the astrologer, as he’d gone for each of his sons, the astrologer raised a warning finger. “For this one, make every sacrifice. Send him to school. This will be the one.” Reddy was sent to school and learned his letters scratching the dirt with a twig. From there, he eventually went on to engineering school and then for a masters to the University of New South Wales in Australia. After working in Sydney for a year or so, he went to Stanford to study with John McCarthy.
Reddy had intended to study the solution of large numerical problems by computer but was quickly caught up in AI. For a class project, he proposed a speech-understanding program to McCarthy, who said it was good idea, but after a few germane suggestions, left Reddy to himself. “It didn’t bother me,” Reddy said, “because I was quite happy to go do what I wanted. But some others who wanted to work with John needed a lot more help. They didn’t get it because John doesn’t operate that way.” That early class project led to decades of challenging research. (McCorduck, 1979)
Programs existed that recognized distinctly separate, clearly spoken words. But the problem of recognizing words in continuous speech? And then understanding what those continuous utterances might mean? This proposition was much more difficult. In 1973, a committee of prominent AI researchers, chaired by Allen Newell, studied the problems of speech understanding for the Defense Advanced Research Projects Agency (DARPA). The group agreed it was difficult but worth a try and noted an interesting paradox—in spontaneous spoken communication, people seemed limited only by how fast they could think, whereas in writing, the opposite was true: people couldn’t write as fast as they could think. Speech was the primary, normal communication between humans, who also wanted to talk with their computers. To figure out how computers could understand speech would be a difficult but fundamental problem to crack.
Reddy soon grasped that the issues in speech understanding were central to AI generally—the balance between an immense number of facts and far fewer general techniques for making sense of them. To understand continuous speech, many different and nearly unrelated kinds of knowledge are needed (semantic, syntactic, pragmatic, lexical, phonemic, phonetic, and so on). How many pieces of knowledge did a person use to decode an utterance? How did the listener decide which kinds of knowledge were more important than others? How did the listener decide he’d finally understood the utterance? For that matter, what was understanding?
Reddy and his team handled the pieces of necessary but unrelated knowledge by constructing a system that simultaneously allowed independent knowledge sources to offer hypotheses about the meaning of an utterance. It was as if each of these hypotheses from different knowledge sources were scribbled temporarily on the same blackboard, where other knowledge sources could see and check them and also generate their own hypotheses. The control system allowed different bits of knowledge to be linked from the simplest to the highest-level. “The analogy I use is a Russian, a German, a French, and a British engineer all coming together to design an airplane. Each one of them is an expert in a different aspect of aircraft design. They don’t speak the same language, but they write their solutions on a blackboard, which others can use without understanding how that solution was arrived at.” Reddy emailed me.
Understanding? What was it? Again, Reddy and his team chose behaviorist measures, six different ways to identify that understanding had taken place. Some were straightforward, like giving the right answers to questions, paraphrasing a paragraph, or drawing inferences from it. Some were less so, like translating an utterance from one language to another or predicting what the person might say next. These aspects of understanding work on different levels. As we’ve always suspected, understanding can be deep or less deep. How deep, everyone wondered, did understanding have to be in order to be useful? (Not very, as we’ll see.)
In years to come, the blackboard model would be fundamental to all commercial speech-understanding systems. The model would be adopted across much of AI as a way of coordinating multiple knowledge sources to arrive at a plausible answer to a problem. Hearsay, Reddy’s program, also was one of the first programs to use probability as a measure of belief (for example, the program could determine a word is probably fix, not kicks). Now, this statistical technique underlies much of modern AI.
One September morning in 1976, I sat in on an early demonstration of Hearsay. Next to me was Herb Simon and I told him about a meeting I’d been to that summer in Los Alamos, where I’d met the elderly German engineer Konrad Zuse, builder of the Z3, the first fully functional, program-controlled electromechanical computer, which he constructed in 1941 in the living room of his parents’ Berlin apartment. Zuse understood at once that his machine could process symbols as well as numbers: he’d invented a programming language called Plankalkül, which allowed him to imagine chess playing and other intelligent applications. He’d also expressed some wariness to me about AI. “Playing with fire,” he said in his heavy German accent. Simon in turn told me about his and Dorothea’s summer, eating and drinking their way through southern France.
The Hearsay demo began and hushed us. A sentence was spoken aloud, so we’d know what Hearsay was hearing, and how it responded.
Hearsay responded by crashing at once. We smiled. This wasn’t—isn’t—unusual with the first few runs of any computer program. (It was the major common-sense argument against President Ronald Reagan’s pet defense program known popularly as “Star Wars,” which must work perfectly the first time.) As the wizards re-tuned, Simon and I chatted some more, easily understanding each other, gesturing to fill in, elaborate on, and shade meanings, using incomplete sentences, stopping to laugh, as we always did. Again, speech is the original human communication, and long precedes writing. Yet it was vexingly difficult to teach computers how to listen and understand.
After a while, Hearsay resumed. This time, success. What we heard (and saw) would soon yield a system—the first of several generations of systems including Dragon, Harpy, and Sphinx I and II, each redesigned for faster search—that won a DARPA award. For example, Harpy was the first system to understand, with less than ten percent error, continuous speech in anything like real time. If Harpy wasn’t entirely sure what it heard, it could make a best guess. In that, it was very human-like. True, its vocabulary was only a thousand words and confined to a narrow domain, but Harpy was a serious beginning. What it also showed is that knowledge is mainly dynamic, not static.
In Machines Who Think I wrote:
The symbols that stand for knowledge are entities with a functional property. Symbols can be created; they lead to information; they can be reordered, deleted, and replaced. All this is seen explicitly in computer programs, but also seems to describe human information processing too. Understanding is the application—efficient, appropriate, sometimes unexpected—of this procedural information to a situation, the recognition of similarities to old situations and dissimilarities to new ones, and the ability to choose between doing the small repairs, or debugging, and changing the whole system.
Harpy, a successor to Hearsay, was, among other things, the stark recognition of how important context is to understanding. Yes, philosophers had long asserted that context mattered, but philosophers asserted endlessly with not much to show for it but assertions. They might be correct; those assertions might even match our intuitions. But assertions aren’t proof. Neither are intutions. Harpy was proof.
As significant as Harpy was, the local Pittsburgh newspapers yawned. The national papers were oblivious. (No one expected TV to pay attention.) But when Harpy won a DARPA award from the Defense Department, John McCarthy thought it was worth making a fuss about and informed The MercuryNews, Silicon Valley’s hometown newspaper in San Jose, which published an excited story. In Silicon Valley, people got the significance very well. Reddy’s graduate students would take the work ever further, including a young Taiwanese named Kai-Fu Lee, who designed the first speaker-independent continuous speech recognition program (and will figure in our story later).
A computer program demonstrably doing its intelligent stuff, however elementary, was thrilling to me. What, really, was understanding? What mattered in knowledge representation, either in computers or even our minds? I chewed on these issues with endless pleasure. I arranged lunchtime meetings with philosophers (it was said that the University of Pittsburgh had infamously bought the entire Yale philosophy department at some point, and Pitt was strong in matters of epistemology and philosophy of science). This was the stuff of my days, and I loved it.
Even in a department of stunning visionaries, Raj Reddy stood out. When he saw the first personal computer with a graphical interface controlled by a mouse (Xerox PARC’s Alto machine), he knew that the computer science department at Carnegie Mellon must have one for everyone in the department, about a hundred. Funders just laughed. The head of Xerox PARC suggested maybe ten? Raj began to raise money for CMU’s own equivalent to the Alto. Dan Siewiorek, an eminent software designer, recalls (Troyer, 2014):
Raj was like the Wild West. Anything conceivable was possible. He could just go off and do anything. Have you heard of the ‘half-Raj’ and the ‘full-Raj’? The half-Raj is when Raj says, ‘Dan, I’d like to talk to you,’ and you know it’s going to be very interesting. When you get the full-Raj, he puts his arm around you and you’re going to be totally involved in a grand adventure.
After Joe and I left Pittsburgh, Reddy founded the Robotics Institute at Carnegie Mellon (its building known informally as Raj Mahal). The Institute was independent of the computer science department and independently funded. It caused uneasiness among some of the faculty—would the high standards expected of Carnegie Mellon students in computer science be maintained? Yes, the Institute jumped to prominence immediately and sustained that prominence, eventually bringing to Pittsburgh firms eager to industrialize that technology, such as Uber. Reddy himself was especially interested in robots that responded to—understood—voice commands.
But then Reddy was interested in so much. Robotics, of course, especially robots that can see, hear, speak, move; interested in language generally, and in specific languages; in human-computer interaction; in machine learning; in software research. He put enormous energy into national and international science and technology policy. For example, he co-chaired the U.S. President Bill Clinton’s Information Technology Advisory Committee and was Chief Scientist of the Centre Mondial Informatique et Ressources Humaines in Paris. He was key to three or four major projects to help bring information technology to the developing world.
After Joe and I left Pittsburgh, I sometimes returned to visit, and the Reddys would always invite me to dinner. They knew I loved Indian food; I hope they liked my company. By now they lived in a big house in Shadyside, just off Fifth Avenue. The house was always full of relatives—cousins, nephews, sons of friends—whom Reddy had brought from India to the United States to give them the same opportunity to study that he’d had. You never knew who’d be sitting down to a sumptuous dinner, but you could be sure it would be great fun.
I came away from these dinners unaware I was trailing a cloud of Indian spices—turmeric, coriander, cumin, cardamom—into the home of my hosts, Lois and David Fowler, who dined on unadorned New England fare. Lois only told me much later after she and David had been invited to the Reddys’ for dinner and came home in the same cloud. “We always wondered,” she said. “Now we know.”
One mild May evening in 1981, I was at the Reddys’ table yet again. We ate Anu’s wonderful Indian food; we drank a superb bottle or two of Chateauneuf du Pape (Reddy’s stint as the chief scientist of the Centre Mondial in Paris had certainly enhanced dinner wine selections). The children went off to do their homework; the cousins, nephews, and uncles drifted out.
We three lingered over our wine. The Reddys liked the lights to blaze, and I wondered if this was a defiant response to childhoods spent by smoky lantern light. All of us were in our forties now, and the blaze was unforgiving of the deeper creases in our faces, the darker shadows under our eyes, the start of graying.
Our mood turned dreamy. We reminisced about days at Stanford fifteen years earlier. Anu urged me to come with her to India and talked about what we might do if I visited her village. We laughed about womanly things, and some years later, when I was on a bus in Tokyo to the shrines and temples of Nikko, surrounded by Indian tourists, I took in the women in their plastic barrettes, their synthetic saris, and heard Anu’s voice in my ear: “Oh, Pom, those artificial saris are so shobby; you wouldn’t be caught dead in one.” I blew her a kiss across continents.
Reddy that evening told me the story of his father and the astrologer. He added impatiently and sadly how wrong the astrologer had been. “Each of my brothers, plowing the fields, is as smart as I am. I got the opportunities. They didn’t.”
For this was what truly drove Reddy. It wasn’t just an eagerness to conquer the next scientific or technological problem, although there was much of that. It was his own life. Unlike some Westerners (and, for that matter, Mahatma Gandhi) who entertained fantasies of a prelapsarian village life, uncorrupted by any technology more complicated than the spinning wheel, Reddy had grown up in such a village. He knew that people in such villages—not just in India, but in China, in Africa, in the Americas, the Arab world—were trapped inside a corrosive, deadly ignorance. They were prey to demagogues and anyone else who wanted steal from them, cheat or manipulate them; they were prey to diseases they needn’t suffer; they were vulnerable to, and often overwhelmed by, customs and traditions that smothered the human spirit. In those villages were young Raj Reddys, hungry for the world’s knowledge—maybe to make use of it, but above all, to taste the joy of knowing it.
Thus Reddy believed every step he took toward improving and distributing information technology was sacred. AI wasn’t about replacing humans, pushing them out of jobs, making them superfluous. It was about releasing humans, not only from literally backbreaking, knee-pulverizing, mind-numbing work, but from oppressions of every kind.
Driven always by the realities of learning to write in the dirt, at the expense of his equally gifted brothers, Reddy wasn’t just a scientist of great distinction (the Legion of Honor from François Mitterrand in 1984; the Turing Award in 1994, shared with Ed Feigenbaum; the Padma Bhushan award from the president of India in 2001; The Okawa Prize in 2004; the Honda Prize in 2005; the Vannevar Bush Award in 2006; a slew of honorary degrees; and in 2014 a fellow of the National Academy of Inventors). He also became an activist in international computer education.
When I went to West Africa in 1982 to look at computer education projects, I went with a list of phone numbers from Reddy. Those numbers connected me with West African field experts of the Centre Mondial, the early 1980s French effort meant to bring computing to its former colonies. I’d write about that wonderful experience in The Universal Machine.
What I didn’t know was that Reddy had made sure I’d not only be welcomed, but looked out for. The phone numbers reached to the top of the West African power network. When I returned and thanked him, he smiled, shrugged deprecatingly. “I wanted to make sure you were always okay. Some of those places can be pretty rough.” I hadn’t even known I’d run any risks.
Reddy helped found the Universal Digital Library. Although the library was planned to hold a million books, by 2007, it already had a million and a half scanned volumes, readable for free over the Internet. He wanted to see every book possible made available to everyone on earth. “Even pornography?” I teased. “Sure,” he retorted. “No censorship. I mean every book.”
Google Book Search, the Gutenberg Project, and the Internet Archive book scanning projects eventually replaced this effort, but Reddy was happy with that—so long as the books were up, available, readable. He considered his own project a proof of concept and was glad for others to take it over.
His work for his native India would be indefatigable. He was a founder of the Rajiv Gandhi University of Knowledge Technologies, intended to serve the educational needs of gifted rural youth; he served on boards and governing councils of other Indian universities; he was active in a network of elementary schools that served the poorest of the poor.
Meanwhile at home, when computer science in all its aspects at Carnegie Mellon had outgrown what any single department could encompass, a School of Computer Science was founded (largely engineered by Allen Newell) and Reddy served as its dean from 1991 to 1999. There, among other things, he helped to establish a Department of Machine Learning. It was a sweet tribute, I thought, to the original Hearsay.
In 2014, Reddy mused to me about early AI research. He was distinctly skeptical about its original biases:
From Turing on, intelligence was really only human behavior in an educated world. Did that mean all those illiterate billions in the world weren’t intelligent? Of course not. Any society that can invent writing and zero must be called intelligent.
Babies at birth, he went on, are already pretty smart—they soon have the capacity for language, acquired in the womb. Large numbers of connections between the motor and the visual cortex fire up in early babyhood. “In short, we need to think of the different levels of behavior that comprise intelligence. You can’t get to four-year-old intelligence until you’ve done three-year-old intelligence.”
He calls his latest project Guardian Angel technology, a way of getting the right information to the right people at the right time. An intelligent agent can scan vast amounts of information on behalf of its “ward,” learn what its ward can’t know, decide what’s important, what’s relevant, what knowledge might protect that ward, and whisper into its ward’s ear. It won’t be supernatural—it can’t predict the unpredictable.
The biggest problem will be how the agent decides importance. You don’t want continuous, boring, day-to-day stuff. You want only warnings about possible misfortunes. He envisions this for every man, woman, and child on the planet, using cheap wearable computing. “We can do this,” he says confidently. Microsoft Research has a similar set of programs under development for down-to-earth reasons. The time is right.
Jason Hong, one of Reddy’s colleages at CMU, envisions a variation of the Guardian Angel called Maslow (named for the thinker Abraham Maslow, who proposed that humans have a pyramid of basic needs to be met). Maslow the program is a set of personalized agents that “can help us find, set, and meet hard goals in meaningful ways that we choose. Think of it as a cross between a lifelong coach, a caring uncle, and an honest and supportive friend. . . . Healthcare is a clear case where humanity needs significant help in achieving hard goals” (Hong 2015). Hong continues by giving examples of how Maslow might help meet those goals. What if we decide we’d like to be more green? Or wanted to learn painting? “Maslow might even help us find compelling new goals to set for ourselves in forms that are fun and engaging. . . . Maslow could incorporate deep ideas from psychology to help motivate us and sustain changes in our behavior, all the while ensuring that the interventions we get are commensurate with our ability and level of motivation” And Hong gives reasons why Maslow is entirely feasible within fifty years.
In September 2016, Reddy gave the keynote address at the IBM Cognitive Colloquium. He displayed the desired attributes of an intelligent assistant, a list he’d long ago drawn up with Allen Newell: it should learn from experience; exhibit goal-directed behavior; exploit vast amounts of knowledge; tolerate errorful, unexpected, and possibly unknown input; use symbols and abstractions; communicate using natural language; respond in human reaction time (milliseconds). Kenneth Forbus, the distinguished AI researcher who posted this in social media, reminded us that this “tells us how far we have to go, compared to where we are now, despite amazing progress.”
Natural language processing has come a very long way since Reddy’s pioneering efforts. Low-cost household gadgets you talk to in natural language, like Siri, Alexa, Google Assistant, and Echo, are growing in popularity. In 2018 in San Francisco, two award-winning college debaters, Noa Ovadia and Dan Zafrir, debated an IBM program called Project Debater on the topic “We should subsidize space exploration,” followed by “We should increase the use of telemedicine.” The audience declared the outcomes a draw but thought Project Debater (meant to exhibit IBM’s ability to consult very large data sets, including news articles, and convert that information into flowing, spoken prose) conveyed more convincing information than its human opponents but was less persuasive as a rhetorician. IBM envisions Project Debater as an assistant to human decision-makers, supplying them with evidence-based arguments for one position or another in the midst of conflicting opinions (Solon, 2018).
But as Reddy had imagined more than half a century ago, we’re talking to our machines—and they’re talking back.
One of Reddy’s graduate students, Kai-Fu Lee, Taiwanese-born, now Beijing-based, who got his PhD for the first speaker-independent program to recognize continuous human speech, would disturb the world in 2018 with a popular book about the coming confrontation between the world’s two AI superpowers, the United States and China. But that was to come.
This book has focused so far on how the early science of artificial intelligence began, prevailing against a major storm of scientific scorn generated by the peers of these early AI scientists. As late as 1975, when the field of computer science was preparing a progress report on itself for the National Science Foundation, the committee planned to omit any mention whatsoever of artificial intelligence. Only when one of the most distinguished scholars in the field, Donald Knuth, insisted that AI be included, did the establishment relent. Meanwhile, the private sniping was vicious. I know. I heard it.
Despite AI’s dubious early reputation, it began at once to elucidate the nature of human intelligence and would push scientists to look at intelligence in other species.
But I also became involved in AI and its fate, as we’ll see in the next section.
- The next journal to take an interest was The National Enquirer, then a supermarket tabloid of the ridiculous, the spurious, and the lascivious. Reddy didn’t want to talk to them, but they threatened to arrive in Pittsburgh and force their way in. Joe got on the phone to them and said, as department chairman, he’d be delighted to talk to them. What were they most interested in? The significance of the new Kung-Traub algorithm? Parallel algorithms and complexity? ↵
- As for improvements in voice understanding, key members of the team that originally designed Siri, your voice pal on your smartphone, have moved to a start-up called Viv, where they envision a consumer-friendly personal assistant you can talk to, connected to a global brain in the cloud that will respond in depth. Hackers smack their lips. ↵
- Emphasis on this new direction for Watson might be because, despite estimated billions in investment, IBM’s hopes to bring AI to medical care were in some ways disappointing. We know that sooner or later, AI will shape medical care, but maybe Watson won’t necessarily be in the forefront. ↵