The Digital Humanities

Pamela McCorduck

27

1.

In the fall of 1960, about the same time I heard C. P. Snow deliver his Two Cultures lecture at Berkeley and was introduced to AI, I was taking a course in Italian Renaissance literature. When I visted my professor’s office for a consultation, he told me he’d nearly finished a concordance to Petrarch. After years devoted to Petrarch’s every written word, with 3 x 5 cards crammed into shoeboxes on shelves, the desktop, the floor, he was jubilant.

“Too bad you didn’t use a computer,” I sniffed, surely spoiling his pleasure a bit. But I was the newly converted and a pain.^{^[1]}

Over the years, I grew more tactful, but as this history reveals, my efforts to bring AI, and computing generally, to the attention of humanists—colleagues when I was teaching at a university, New York editors as I became a full-time writer, librarians, other writers when I was active in the PEN American Center, the authors’ freedom of expression organization—were mostly futile. “This could be important,” I’d say. If they listened, which was seldom, they scoffed. At best they wanted to redline me into a narrow cell called science writing. That computers, never mind AI, might have a larger significance seemed absurd to them.

Computing surely seemed absurd because so many of them were at such a willed distance from it. As a new board member, I’d walked into the PEN American Center offices in the early 1980s to find typewriters, and membership records in, yes, shoeboxes. Before Vartan Gregorian knew my name, he took me aside at a New York City Library fundraiser and asked for a donation “before computers take over the library.”

I couldn’t expect humanists to learn what had taken me more than a decade’s hard work to understand. But I faulted them for not troubling to ask whether this could be important.

So I drifted from the formal humanities. That didn’t mean I stopped reading literature or history, listening to music, going to galleries, or pondering philosophical questions. Instead, like millions of others, I did all those for the deep human joy of it. Novels and poetry, history and biography, music, the visual arts, the best way to live a good life: all these represent human concerns at one immediate, endlessly compelling level of existence.

Biologist Edward O. Wilson (2014) reminds us that our fascination (maybe obsession) with each other is wired in, an adaptive characteristic of our species that has helped allow us to prevail. It’s the evolutionary excuse for our intense preoccupation with ourselves, eternally evaluating one another in “shades of trust, love, hatred, suspicion, admiration, envy, and sociability” These are the traditional tasks of the humanist, although humanities scholars go further, by examining in depth how our self-fascination manifests itself in works of art, be that painting, literature, music, religious beliefs, or history.

Questions around the humanities are embedded within other questions. How did we get here? What makes us unique in the biosphere? In the cosmos? (If we are?) How is collective human behavior different from individual behavior? What accounts for the many contradictions in our behavior? Why are we noble? Why self-destructive? At a different level from art, these questions are so far more precisely answered by science. The best answers take combined approaches.

From the beginning, mystics have understood the significance of such questions and offered us religious myths. But as we saw in Chapter 23, Jaron Lanier argues we’ve confused the science and technology of AI with mythology, reinventing AI not as something that requires thoughtful deployment, but as a divinity to be feared. He’s right, and it’s regrettable. Mythical answers of any kind no longer wholly satisfy a secular and scientific age. Instead, human and machine intelligence are now seen in a grand computational framework called computational rationality: a converging paradigm for intelligence in brains, machines, and minds.

Poised at the beginning of adult life, that young woman who drank in the Two Cultures lecture so thirstily was to spend her—my—adult life longing to reconcile the two cultures that I loved so much. My intuition pushed me slowly toward speculation, then conviction, that this symbol-manipulating device called the computer, especially this branch of computer science called artificial intelligence, would illuminate human intelligence in important ways. Yes, it was engineering, it was science, yet it might reveal some secrets that had so far evaded us. It might do more. I could only guess. Maybe hope.

What I couldn’t know—and it’s significant—was that the lecturer who set Anglo-American letters disputaciously ablaze with his Two Cultures thesis in the middle of the 20th century had the same yearning. Deirdre David (2017), in her biography of novelist Pamela Hansford Johnson, Snow’s novelist wife, notes how Snow and Johnson’s love and marriage was very much grounded in writing stories and how profound Snow’s yearning had always been to be taken seriously not as a scientist but as a writer. Doing science was nearly accidental for him, but came to provide the milieu he could write fiction about. He too wanted to conjoin the Two Cultures. Perhaps his talk stuck with me for so many years because, without knowing it, I responded to exactly that yearning.

Over the decades, computers have penetrated our lives. AI moves slowly toward human-like behavior and in some cases betters it. Natural language processing improves, likewise automatic translation, and those improvements offer important techniques to linguistics and new ways to describe phonology, understand human language processing, and model linguistic semantics (Hirschberg & Manning, 2015).

AIs have learned how to interpret and respond to human facial expressions (hence human emotions, which some programs can even anticipate) and are learning how to interact safely and inoffensively in human spaces.^{^[2]} Composers have brought the digital to music in unexpected ways; movies have been voracious, pushing digital visualizations ever further; art that owed its being to the computer hangs on my walls—either original and grounded in AI, like Aaron’s, or images transformed by a human sensibilility, like Lillian Schwartz’s. Younger artists demonstrate that advanced video games will be the next great storytelling medium. Storytelling, as we’ve seen, is one of the great markers of human intelligence.

But the idea that the digital might also be touching literary scholarship or other parts of the humanities was news that came to me fitfully. I’d hear a talk. I’d read something in passing.

Now, as knowledge is being moved at breathtaking speed from human texts, images, and skulls into software, attention is at last being paid. Formal programs in the digital humanities have been established in nearly all the major American and European universities, in the conviction that this is a great moment of cultural-historical transformation. Because the entry fee is relatively low, schools of more modest means can also participate.

Borders around the digital humanities are porous, nearly nonexistent, because settlements can be established anywhere that humans and computers can go. Quantitative analysis is growing more important in the humanities, and text as the primary repository of human culture is challenged. Text is blended with, even transformed into, the graphical, the musical, the numerical. But the purposes are sweetly familiar: to know and understand as deeply as possible what humans feel, know, and do. Recall Leslie Valiant’s claim: “Contrary to common perception, computer science has always been more about humans than about machines.” (2014).

2.

In early 2015, it seemed fitting to return to the Berkeley campus where, more than half a century earlier, the Two Cultures, along with artificial intelligence, had ambushed me simultaneously. Compared with some institutions, Berkeley wasn’t far advanced in the digital humanities then, but a small, lively program existed, supported by a $2 million grant from the Mellon Foundation. An enthusiastic team, which reports to the Dean of Arts and Humanities, has slowly seeded the campus with modest projects. Claudia von Vacano, who heads this program, told me their strategy: as one humanities scholar transforms his or her work, colleagues will watch, and become inspired.

The Dean of Arts and Humanities, Anthony Cascardi, himself a professor of comparative literature, has crisp reasons for the humanities to be involved in the digital world. First, the computer can organize, sort, investigate, and navigate great swaths of information and bring access to content that is now inaccessible, he told me. (It’s a longtime scientific and computing axiom that more is different, that quantity changes quality.) Then, the computer allows intersections with different media, creating new research that in turn creates new content. In short, this is where the world is, and the humanities need to be there, too.

The successes and challenges of Berkeley scholar Niek Veldhuis, professor of Assyriology, typify some of the early struggles of the digital humanities. Sumerian is a linguistic isolate: it’s seemingly related to no other known language. However, Akkadian, a Semitic language, was spoken contemporaneously with Sumerian, and from context, scholars are painstakingly deciphering what written Sumerian—the marks on cuneiform tablets—might mean by comparing them to known Akkadian words.

Some decades ago, a Sumerian dictionary was undertaken, but after 25 years, only four codex volumes had been produced, going from A to the beginning of B. Even by scholarly standards this was frustratingly slow, so instead, Veldhuis is trying an intermediate approach. He scans cuneiform tablets and pulls out words to post online with glossaries for other scholars to examine and interpret. This is not yet a dictionary, but its raw materials.

The project was actually begun by one of Veldhuis’s former professors at the University of Pennsylvania, who made images of tablets on a simple flatbed scanner. Veldhuis laughed as he recalled to me one early problem. A technologist said: how cool would it be if you could look at these tablets in high-definition 3D, turn them, see each side. So that was begun, but the process is so expensive that far fewer images have been produced, and none is available for free online. The tradeoff between gorgeous, up-to-the-minute technology and cheap sufficient technology is apparent. In the future, it might be possible to download many gigabytes cheaply, Veldhuis mused, but not now.

He also devised a projector that allows his students to explore the image of a given Sumerian word in the seminar room together. The projector is a pedagogical tool to help refine their research techniques quickly, justify reasoning before the group, and speculate about other possibilities. “We all watch the process of research, asking how do you verify? What tools do you use? New questions arise from new technologies,” he added. “That’s heartening. We need to grab opportunities.”

Lately he and his colleagues have begun to map the communities and social patterns of Mesopotamia, a relatively new project in the field. They analyze digitized data from early clay tablets that contain inheritance documents or sales contracts to reconstruct the social relations of ancient Mesopotamia, producing a graphical representation that tells us much more about how people actually lived and interacted.

“My expectations,” he mused, “were that the technology would be hard—but no, it’s the cultural differences that are hard. This is the point of the 3D versus the flatbed technologies. How much technology is necessary? A colleague wants to produce an online version of The Egyptian Book of the Dead, with images and commentaries on those images. Technologists will tell you this is easy. The problem is, which system will work best for what she wants to accomplish? She needs many conversations with technologists to settle on a system that does everything she wants to do.”^{^[3]} Never mind the problems of how systems grow obsolete and are replaced.

Veldhuis sees the digital humanities as not only developing technique, but as social engineering. More humanists need to know what can be done and learn how to do it. Among his challenges is helping his colleagues to understand that work he posts online is provisional. This online work isn’t as certain as the material he’d publish in a scholarly paper but is there to be examined, queried, and tested against hypotheses. In his field, this approach is highly unusual—scholars usually publish only what they are certain of.

Before I left his office, Veldhuis indulged me in a brief conversation about the origins of writing, a history of experimentation, he said: some things worked, some didn’t. (Generate and test, computer scientists would say.) “But finally in the fourth millennium BCE, cuneiform took hold, partly driven by urbanization and the complexity of urban life, with its specialization and larger population, this despite apparent political upheavals. We don’t know for sure because no political records exist for that time. Written records at the beginning were essentially accounting records: two goats and a sheep. Cuneiform prevailed thanks to its flexibility, and within a century, could express political statements. So-and-so is King.”

He dug into a drawer and brought out the real thing, cuneiform tablets he kindly allowed me to hold in my palm—small incised clay fragments the size of a domino, the profoundly thrilling distant ancestor of the words I write.

The ebullient Elizabeth Honig, an associate professor of art history and another scholar using the digital humanties at Berkeley, specializes in the oeuvre of Jan Brueghel, son of Pieter and father of Jan the younger. Early in her career, Honig realized that with so many extant works, knowledge about them was scattered in many human heads around the world. Pooling that knowledge would be extremely useful. Encouraged by a senior European scholar, she constructed an informal wiki that other Brueghel experts could contribute to and consult. This led to an early Mellon Foundation grant to put together a proper website and digitize many works. Luckily, her life partner is a computer scientist and helped her through some of the difficulties in setting up the website. Soon she received another grant for course development and can give students modest course credit for contributing to the website. She too sees this as a pedagogical tool that helps students develop their art historian skills.

One student’s task has been to identify the patterns that appear in Jan Brueghel’s paintings. The painter and his studio often employed repeated patterns of travelers in a group, windmills, or other such images, sometimes using the equivalent of stencils. To identify these is one way of authenticating a painting or a drawing. Scholars also find it useful to detect how these patterns vary in otherwise authenticated works. Honig hopes someday to be able to compare and contrast works automatically, answering whether one painting riffs on different compositional elements of another painting.

Eventually the Jan Brueghel website will have all the data the original wiki contained and an underlying database that allows scholars to trace the events and reasoning that drove authentification in the first place. The site will contain a timeline and maps and become a public utility. The website has presented unusual challenges. “Dealers can be unscrupulous,” Honig told me. Because at least one dealer hacked the website, put up an image of a fake Brueghel he was trying to sell, and added fake provenances, the site now requires anything added to it to be traceable by any scholar.

Art historians are taught that a very high-resolution, black-and-white image of a work is better for analysis than a color image—details like brushwork are more apparent, for instance. When Honig studied such a photo of a painting in a private English collection, its odd brushwork made her question its authenticity. This made a London dealer nervous. He was selling the painting to a private collector in Shanghai, and Google Analytics on Honig’s website showed that parties on both sides were consulting the website repeatedly. The website had become central to the authentification process. (Eventually the painting was authenticated.)

But like Veldhuis, Honig too has faced other challenges this early work raises. When we met, the website had already required five years of work, and useful as it might be, no scholarly credit accrued to her in terms of professional promotion, which normally rests on peer reviews. Thus she’s had to begin writing a book, interpretive of Brueghel’s art, as distinct from the website, which is solely factual.

Moreover, she’s concerned about the ephemeral nature of art websites. Who does them? Who will keep them up? How long will they last? On the positive side, younger people are at home online and know how to navigate, which requires different skills from reading a monograph. Despite the challenges, she knows exciting things in art history are being done. As one example, she named the Bosch Research and Conservation Project,^{^[4]} which is devoted to the work of the fantastical Dutch painter Hieronymus Bosch and designed and organized for art historians and the way they think.

Like Veldhuis, Honig sees ambitions that are sometimes too steep for current possibilities. Basic, open-source tools are needed, a point that Anthony Cascardi, the Berkeley Dean of Arts and Humanities, also made to me. Berkeley is hoping to solve that problem with a campus-wide project that provides such tools. I’m put in mind of the early history of programming languages, when everyone made up their own special-purpose language. More powerful, flexible languages eventually came to dominate, and that will happen with digital humanities tools, too. But for now, problems persist.

Under the eucalyptus trees outside the Center for New Music and Technology, I met with Edmund Campion, a genial and enthusiastic composer and member of the music faculty, to talk about his uses of the computer in musical composition. (February under the eucalyptus trees! I’d just left a completely icebound Manhattan.) Campion began:

Music has lots of data, and thus has always responded to new technologies: Mozart loved a new instrument called the clarinet; Beethoven introduced the trombone into several of his symphonies. Thus I don’t think of myself as a digital composer. I’m a composer. I’m doing what composers in the past have always done, taking advantage of whatever my times offer me. Any composer needs to assess a site, an orchestra, or a new instrument, or a computer: what can I generate from this? We need to mine systems for their creative potential, and find a sense of alignment with the possibilities of our instruments. I leverage every possible means at my disposal.

Campion is unusual in having been academically trained, and yet believes he got his real musical education by playing in rock bands and progressive jazz ensembles for real audiences. This makes his music and his approaches very different from the classical music of most of the second half of the 20th century, which he calls “nearly unlistenable.” Composers who began with computers were in denial about this, he says ruefully. About his time at IRCAM, the French center of avant-garde music, he said:

The issue was ecriture vs. sculptural electronic music. We were asking, “What is a note? It isn’t at all clear.” The 18th and 19th century composers got all that under control, but then Schoenberg opened it up to all sounds again.

But this preoccupation and its experiments overlooked the listener.^{^[5]} Music, Campion strongly believes, is a social contract:

You must have listeners—you must be part of a community. I don’t mean popular, as such, but there must be a community of listeners who respond to your music. I think of music as a lattice—there are all kinds, and they’re all wonderful. But music cannot happen in a vacuum. It needs a community.

Moreover, now is a time of collaboration, which the computer enables. “I trade files with video artists: is this music working for your video? Yes? No? What changes will make it work?” Campion said. He stopped for a moment and then continued:

This brings me to the digital humanities problem. I’m wary about the lack of engagement on the part of the humanities that might result in a massive forgetting. Young people want to use the new technologies, and if they don’t have guidance, if they have no feed-forward from those who have gone before, that’s a terrible loss. My role as a cultural agent is to make connections between the past and the future. My students are very accomplished with the tools; they’ve grown up with them, which I didn’t.

But they have no sense of stepping off from where their predecessors were. If I don’t introduce them to the past, they don’t know about it. That’s very difficult, because the technology has changed so much over the last half century that nobody has all the resources to bring the old stuff to the new platforms. It’s all but lost.

Although Campion is talking about music, the same peril is everywhere as the digital divide grows and predigital scholars fail to engage with the young. These are the unsurprising growing pains as the great structure that I talked of earlier begins to rise, that Hagia Sophia called computational rationality, encompassing, connecting, uniting and defining intelligence wherever it’s found, in brains, minds, or machines

Three years after my initial visit to these Berkeley digital scholars, Anthony Cascardi, that lively dean of the Arts and Humanities, reintroduced me to Charles Faulhaber. (We’d met earlier when he was head of the Bancroft Library at Berkeley, which specializes in American studies.)

Faulhaber is a professor emeritus of Spanish literature and in his retirement can now devote himself fully to a project he’s been at for a while. It’s called PhiloBiblon^{^[6]} (after a medieval text describing the perfect library) and is an online bio-bibliographical database of medieval texts of the Spanish, Portuguese, Galician, and Catalan languages. It documents the kinds of texts that will serve as data for the Diccionario del Español Antiguo being resumed by the Real Academia Española after a hiatus. This will be a Spanish equivalent of the Oxford English Dictionary, that is, the notation of a word from its first appearance in text with examples of usage over time. PhiloBiblon will also serve other lexicographical projects, critical editions, and other text-based projects focused on medieval Spain.

Any scholar may access it, but Faulhaber’s ambition, he tells me, is to move the entire database from the present Windows format to the World Wide Web, so that it can take advantage of the semantic web capabilities. At the moment, only one person at a time can add to the PhiloBiblon database. “We can provide web access to the data, but it is not an elegant process,” Faulhaber told me. His goal is to put the database on a server where any authorized user would have access to it, from anywhere in the world, in order to add data. “An editorial committee would vet all changes to ensure that they conform to our standards—a classic crowdsourcing application.”

Thus scholars in libraries in Spain, Italy, France, England, or Russia could add data from real-time inspection of primary sources. This would eliminate the single greatest bottleneck in maintaining and expanding that data. “There is no substitute for first-hand inspection of primary sources, but the semantic web will make it much easier to look for those sources,” Faulhaber says. “Every day, libraries all over the world are adding data about their holdings as well as digitizing them. Finding these new materials on the web is a hit-or-miss proposition right now.” Currently the only way to add these data to PhiloBiblon is by manual cutting and pasting. “We’ve done it that way for forty years, but the semantic web makes it possible to automate this process.”

The semantic web is a network that automatically collects texts so that meaning can be teased out of them, to follow the progress of a given word through its evolution in the language. This would be impossible without AI. Picture this, and contrast it to the lone scholar’s labor to produce a concordance of Petrarch’s works, which opened this chapter.

In 2017 faculty colleagues in engineering and computer science across the Berkeley campus issued a technical report, A Berkeley view of systems challenges for AI (Stoica et al., 2017). Although this was intended for computer science and engineering readers, some salient points applied to the digital humanities, especially as they began to employ AI. A systems approach would be necessary, crossing disciplines, and such applications needed to support continual or life-long learning and never-ending learning. The report notes that Michael Mccloskey and Neil Cohen define continual or life-long learning as “solving multiple tasks sequentially by . . . transferring and utilizing knowledge from already learned tasks to new tasks while minimizing the effect of catastrophic forgetting.” Never-ending learning, however, is “mastering a set of tasks in each iteration, where the set keeps growing and the performance on all tasks in the set keeps improving from iteration to iteration.” Other challenges include adversarial learning which occurs when malefactors compromise the integrity of applications—that unscrupulous art dealer who inserted fake merchandise onto the Bruegel website, for instance.

Meeting such challenges is probably beyond an individual humanities scholar’s abilities, but that scholar needs to be able to ask the hard questions and insist that cooperation among different kinds of experts is essential.

In 1949, the Jesuit scholar Roberto Brusa worked in collaboration with IBM to create an automated approach to his Index Thomisticus, a computer-generated concordance to the writings of St. Thomas Aquinas, but he needed access to mainframes, so flocks of scholars weren’t able to follow his example. ↵
See especially the work of Julie Shah at MIT. ↵
Rita Lucarelli, Assistant Professor of Egyptology in Berkeley’s Department of Near Eastern Studies, researches religion, magic, and funerary culture in ancient Egypt. Her Book of the Dead digital project is now underway, and focuses on creating highly detailed, annotated 3D models of funerary objects to better understand the materiality of the Book of the Dead texts. ↵
You can view the Bosch Research and Conservation Project at http://boschproject.org ↵
John Adams’s engaging memoir, Hallelujah Junction: Composing an American Life (Farrar, Straus and Giroux, 2008) covers this conflict thoughtfully. Adams, his young ears full of early rock music and jazz written for audiences and meant to stir, abandoned the sere wastelands of Boulezian dicta to write music to be listened to. He was an early adopter of electronics, and when he couldn’t afford a proper synthesizer, he built one out of used spare parts. ↵
You can find PhiloBiblon at http://bancroft.berkeley.edu/philobiblon/ ↵

License

Icon for the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License

This Could Be Important (Mobile) Copyright © 2019 by Carnegie Mellon University: ETC Press: Signature is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, except where otherwise noted.

1.

2.

License

Share This Book