Distant Reading and Literary Knowledge

Cultural Analytics Now

Ted Underwood, Distant Horizons: Digital Evidence and Literary Change

1: Conflict

Ted Underwood is ambitious. In Distant Horizons, published in February by University of Chicago Press, he reports that recent advances in statistical modeling offer "new methods of representing and interpreting the world."1 Distant Horizons is a book-length argument for adopting these methods in literary studies. He uses statistical modeling to revise the literary history of language, genre, prestige, and gender. Specifically, he argues that separating literature into historical periods has fostered local insights but foreclosed knowledge about the sweep of literary history across long timelines, knowledge made possible by statistical modeling. Entailed are strong and contentious claims about literary language's distinctiveness and the place of English departments in the university, about which he is admirably direct. The stakes at the root of much recent, spirited debate around computational analysis of literature are found here: in the future of English as a discipline and the nature of knowledge itself, subtended by the decline of English's share of power in the academy and by the post-2008 political and economic order.

Distant Horizons is masterful. Anyone who cares about literary studies should read it, though in his ambition, Underwood addresses it to the broader audience "of people who want to understand the human past."2 It is lucid. It is precisely tuned. It is mostly persuasive. Underwood knows that much computational literary criticism — or, to name his preferred term, distant reading — "can bog down in finicky details" and he manages this obstacle by describing his data and method in appendices.3 He believes that "the real challenge to large-scale literary analysis is not epistemic or ethical but aesthetic: it is simply hard to write with sweep and verve about thousands of books."4 He is wrong about epistemology and ethics, but insofar as sweep and verve is the challenge he sets for himself, he accomplishes it beautifully. For me, it is a page-turner.

Critics of cultural analytics often claim that quantitative work has not produced — or even cannot produce — valuable knowledge for literary studies. Distant Horizons should put these criticisms to an end. Underwood has designed it expressly for this purpose. Each of the first four chapters makes an intervention in current scholarship, revealing how periodization has obscured long-term trends. For example, in the first chapter, he asks, does the novel, across its history, move from telling to showing? We sometimes hear this. And pairing the omniscient narration of Henry Fielding's Tom Jones with the restricted third-person of a Henry James novel can make it appear obvious. But, as Underwood writes, it is "not at all clear that previous nineteenth-century fiction should be understood as a slow progress in that direction. Omniscient exposition permitted the characteristic strengths of the Victorian novel, and it remains important today in postmodern metafiction and genre fiction."5 How might a scholar determine what in fact happened?

To answer this question, Underwood introduces modeling. He defines modeling in its simplest terms as "a relation between variables."6 This is exciting as a development in the intellectual history of the university. The kind of modeling Underwood performs is only a couple decades old, but it has transformed disciplines in the social and natural sciences. Distant Horizons is the most mature example to date of its use in the humanities. To understand it, literary critics need to forget much of what they've heard about digital humanities. Underwood emphasizes that now is the time for a major reframing of this work. He has no use for big data. He is not expanding the canon or analyzing the great unread. He readily acknowledges that his laptop has enough power to run a typical program in no more than an afternoon. Against the purported objectivity of algorithms, he leverages the human prejudices built into modeling toward humanistic ends.

Part of the thrill of reading Underwood's first chapter is watching him adroitly alter the paradigm of cultural analytics. It is a shift from measurement to modeling. As a first grasp at understanding the long timeline of novelistic narration, Underwood turns to a well-known finding from the early days of the Stanford Lit Lab. In 2012, Ryan Heuser and Long Le-Khac published a pamphlet in which they show that between 1800 and 2000 adjectives of physical description became steadily more common in novels and abstractions became less common. At first glance, this measurement appears to give credence to the argument that the novel transitioned from telling to showing. But Underwood carefully undoes whatever authority such pattern-finding might lend, noting, not least, that "in sorting through a vast heap of evidence for something interesting, we run a risk of cherry-picking."7

The crucial transformation is to begin not by exploring literary data for patterns, but "to start with an interpretive hypothesis" and to "invent a way to test it."8 As Underwood writes, "we need to reverse the sequence of steps in our inquiry."9 By this point, he has replicated Heuser and Le-Khac's study. He adds biography "as a contrastive touchstone" and shows that the linguistic trends Heuser and Le-Khac reveal for fiction do not affect it.10 Curious, he presents the hypothesis that across the long sweep of literary history, fiction diverges from biography. To test it, he uses statistical modeling.

The model he uses throughout the book is logistic regression, a form of machine learning. A common way to explain it is with the example of a spam filter. A hypothetical email provider trains its model to learn the difference between spam and legitimate emails by giving it labeled examples of both and asking it to learn the features that most reliably distinguish them. These features could include a preponderance of all-caps or phrases like free money or get paid. The provider tests the model by giving it unlabeled emails and asking it to distinguish. If it can do it a high percentage of the time, that's a good spam filter. Once a model has internalized the patterns distinctive to a category, a practitioner can test how close texts from other categories are to it. If I trained a spam filter, for example, I could then take emails from my alma mater and my mother and test how similar either category is to spam, from the perspective of whatever makes spam distinctive from everything else. Underwood calls this perspectival modeling. He puts it to many uses, including to test the similarity of different genres from the perspective of either, such as detective fiction and science fiction.

In this instance, Underwood trains his model on labeled fiction and biography then asks it to predict the category of unlabeled texts. Most useful for him is not the model's binary prediction but two of its other affordances: that it presents the likelihood that any text is fiction or biography as a percentage on a continuum between the two; and that it reveals the features that allow it to tell the difference. The two genres parted ways over time with fiction most characterized by action verbs, body parts, and verbs of sensory perception, and biography by political terms, organized systems of belief, and abstraction.11 Underwood concludes that "the novel steadily specialized in something that biography (and other forms of nonfiction) could rarely provide: descriptions of bodies, physical actions, and immediate sensory perceptions in a precisely specified place and time."12 Through modeling, he recognizes the truth in Heuser and Le-Khac's measurement and situates it within larger linguistic trends and against biography to discover what is distinctive about fiction. Whereas the measurement revealed an isolated fact and enabled speculation, Underwood's model permits a persuasive argument about literary history.

He closes the chapter with a delicate maneuver. He returns to critical tradition to allow that "scholars already have some explanation for every part of the widening gap between fiction and biography."13 But he reminds his reader that to possess the fragments is not necessarily to recognize the whole. Too often, critics of cultural analytics fall prey to hindsight bias, imagining they have always known what the analyst just demonstrated. Underwood addresses this problem directly by establishing checkpoints along the way to assert what we do not yet know and by ending the chapter by naming and exposing the logic of hindsight bias. The power of his argument is two-sided: from the perspective of the long timeline, the claims of particular periods leap into alignment with one another; and the linguistic acts of particular texts newly make meaning by expressing or entering into contention with not only the well-established norms of the period, but the extended trends of centuries. To end the chapter, Underwood hints at a close reading that would demonstrate this power. I wish he had written it. That he didn't is symptomatic of his diminished opinion of the practice.

The third chapter satisfyingly extends the first by suggesting a possible mechanism. Underwood finds that judgments of prestige across history, based on reviews, increasingly encouraged the understanding of literariness "as temporal immediacy and concreteness."14 He compares this finding with conventional accounts of literary history that emphasize contrast and revolution. In the prefaces of Henry James or in Ezra Pound's manifestos, literary critics have found ruptures in which prevailing literary values dramatically change. But Underwood leverages machine learning to reveal that this is not how literary history works. Perspectival modeling demonstrates instead that literary history has followed a process of accretion that he refers to as the logic of "even more so."15 "'Even more so,'" he writes, "explains how a single gritty reboot that makes money can doom us to a long sequence of ever-grittier reboots. But," he adds, "historians of high literary form have ignored that kind of momentum."16 Just as we've gotten ever-grittier reboots, incentives in the economies of literary prestige have given us fiction that is ever-more-differentiated from biography through distinctive language that we have come to identify as literariness.

When pressed, Underwood's argument about periodization shows cracks. He writes "for the last sixty or seventy years, we have assumed that literary history can only be interesting and edifying insofar as it is a story about conflict."17 Who is this we? I recognize few literary critics in that description, for two reasons: "can only be interesting" ignores much scholarship that does not rely on narratives of conflict; and it underestimates the possibility of irony, that we hold ourselves at a speculative distance from periodization, that we use it as a Trojan horse, that once inside we might find much that is interesting and edifying beyond the disciplinary norms that allowed us in. His assessment of periodization extends the account from his previous book, Why Literary Periods Mattered, where he shows that literary periods arose as a way, first, for the middle class and, later, literary studies to assert cultural authority. Amanda Anderson, in an otherwise glowing review, writes that Underwood lets the demands of his argument lead to his neglect of schools of thought that place less emphasis on discontinuity, such as Marxism, adding that, to her mind, discontinuity "is less pervasive and determining than he claims."18 Again in Distant Horizons, the usefulness of discontinuity for Underwood leads to its exaggeration.

There is also an implicit tension in Underwood's argument against conflictual models of literary history. The conflictual model—we've thought things were like this but they're really like this — is his, too, if transported from the history of literature to the history of literary criticism. He often departs from it to synthesize extant scholarship with his discoveries, and these are among the book's best moments, but his use of the conflictual model himself is part of what leads him to exaggerate the influence of periods on literary history. As he writes, "conflict simply makes a good story."19

2: Literariness

In "The Computational Case against Computational Literary Studies," Nan Z. Da writes that in the "data work" of cultural analytics, "there are decisions made about which words or punctuations to count and decisions made about how to represent those counts. That is all."20 She lodges this as a criticism. She argues that counting words with computers does not offer the nuance necessary to account for the complexity of literature. Scholars that attempt to do so, including Andrew Piper and Ted Underwood, are, in her account, bound to fail.

Underwood makes the case for the power of word counts. His perspectival models rely only on word counts to differentiate categories of texts. "Readers unfamiliar with this method," he writes, "are often disconcerted by the seeming naïveté of representing literary works simply through word frequency."21 He counters with two arguments: that, in the scholarship, "models based on word frequency predict human readers' judgments [about how to classify texts] as [successfully] as more complex strategies";22 and that "genre is expressed redundantly on many different levels," including word counts.23 The first is solid. The second is neither dismissible, as Da would have readers believe, nor as settled as Underwood, at moments, suggests. Several times, Underwood implies that word counts capture — correlate with — other patterns that define genre, including style.24 But he largely defers the question to others, including to Andrew Piper and his account in Enumerations. Piper, in turn, points to the normative philosophy of language in computational linguistics and natural language processing: distributional semantics, which Tess discusses in her review. The meeting of distributional semantics and cultural analytics is, to me, at once rich with possibility and profoundly understudied.

But little of Distant Horizons depends on whether word counts correlate to style. One of the book's achievements is to show the power of word counts for perspectival modeling. The conclusions that Underwood reaches on this basis are mostly persuasive. Consider, again, his first chapter. He uses word counts to model the difference over time between fiction and biography. His model weighs each word based on how helpfully it differentiates the two. He groups the words into semantic categories with a lexicon ("action verbs," "body parts," "political terms") and calculates the weight of each, giving him a nuanced sense of the general discursive differences between the two genres at scale. This is, as Underwood acknowledges, a bit crude, but it is also reveals what he says it does: a previously unknown trend in the long history of the novel. As I've written about elsewhere, he puts the same method to yet more fascinating use in chapter four, which features research done in collaboration with David Bammon and Sabrina Lee, revealing the history of how authors write about gender. Characters were fairly easy for his model to distinguish as either male or female in 1850, but the binary became progressively less clear through to the present, tending toward a convergence or blurring or multiplication of gender. Underwood tracks how individual words contributed to this gendering. For example, "grinned" and "smiled" were neutral through the nineteenth century, but became polarized by 1950 — grinned signifying male, smiled female — only to return to neutral by 2000. "Eyes" and "hair" were neutral in 1800, but became feminine over time — while "pocket" became masculine. It has always been neutral to "read."

Such arguments make a strong case for cultural analytics, showing how modeling can challenge or advance critical intuitions. Distant Horizons settles the question of whether cultural analytics can be useful for literary studies. But Da's essay and the attention it has received reveal the severity of the rift within the discipline over what literary studies ought to be doing in a time of contraction.

Anticipating the contentiousness of his methods and many of Da's arguments, Underwood devotes his final chapter to defending distant reading as a disciplinary practice. Given the temper of this debate, many observers might be surprised to learn that Da and Underwood agree about a lot. Underwood dislikes "technological fetishism" and is skeptical about "big data."25 Like Da, he criticizes naïve uses of topic modeling and network diagrams by literary critics.26 Like Da, he insists that "researchers should share code, report effect sizes, and measure uncertainty where possible."27 The extent to which they agree demonstrates that we have arrived at a new stage of the debate, less about the power of computation or whether cultural analytics has anything to offer than about the implications of modeling and statistical methods. In this way, Da advances the field by confirming that the reframing Underwood advocates is well under way.

Despite their several points of agreement, Da and Underwood are genuinely on opposite sides of a significant disagreement. They differ on the fitness of cultural analytics for literary analysis, and their disagreement stems from fundamentally incompatible ideas about literature itself. Da thinks computational methods cannot access the nuanced complexity of literature. In her monograph, Intransitive Encounters, about strange and misunderstood transpacific literary interactions, she devotes her attention to the "form or rhetoric or logics unique to literature."28 Literature's unique affordances stand at the core of her vision. Distant reading, she contends, cannot recognize or illuminate the ephemeral Chinese-US exchanges that she studies.29 Extending this view of the discipline, Da and Anahid Nersessian launched, in December 2018, a book series with University of Chicago Press, Thinking Literature, committed to "the refinement of literary criticism as a mode of reasoning" and to treating "the art of interpretation as a distinct class of inquiry."30 Elsewhere, Nersessian resuscitates Cleanth Brooks to present the case for defining literature as that which cannot be paraphrased adequately, literature as works of art "that do but do not denote."31 Literature offers, she argues, "the 'negative' or prorogated knowledge" of "the trope or figure."32 Yet elsewhere, Nersessian and Jonathan Kramnick defend the specificity of "form" as a concept in literary studies "in the service of literary disciplinarity without apology or compromise."33 For Kramnick, respect for the autonomy of literary studies — with its "practices of disciplinary reading along with their associated lexicon of form, style, or genre and their associated norms of attention, rigor, historical grounding, and so on" — follows from a more basic pluralism, a "respect for the diversity of the world."34

These are the kind of arguments Underwood engages in his final chapter, pointing to them as a source of opposition to distant reading. He locates them as belonging to a longer literary history of privileging particulars over generalization that emerged, roughly, with Romanticism, and that runs through New Criticism and New Historicism.35 And he is direct about his opposition to drawing an "idealized boundary" between methodologies.36 He writes, "we invent theories about a form of knowledge that only literary critics have access to — because we alone attend to the subtleties of language or the constitutive paradoxes of thought or the ethical consequences of human diversity."37 In his opinion "the candid way to define [the] distinctiveness [of literary studies] is to say that we have the privilege of focusing on things that are interesting or enjoyable."38 He is firm on this: "I am only willing to separate literary history from social science by bluntly emphasizing literary interest and enjoyment."39 Typically generous to his interlocutors, Underwood slips into a rare uncharitable moment, writing that arguments for the autonomy of literary studies "justif[y] incuriosity."40 It is strange to see him dismiss the distinctive strengths of literary studies after he has spent the book demonstrating them.

Is literary knowledge unique in a pluralistic world or continuous with everything else? Both parties exaggerate. Literature is neither strictly autonomous from nor flatly isomorphic with other kinds of language. Underwood shows how literariness is historically contingent, how it depends on the sociology of prestige and the differentiation from biography to become itself. But literary criticism has developed distinct methods to attend to tropes, figures, form, style, genre — the subtleties of language. These methods require extensive practice over years to master, as I remember every time I teach introduction to literary studies and find that otherwise talented students do not find close reading intuitive.

Both parties express a dismissiveness that encourages the mistaken belief that cultural analytics is incompatible with the practices of literary studies. This is why I find Underwood's paucity of close reading a disappointment. Everything he does in Distant Horizons prepares the way for fresh, potentially stunning close readings that would demonstrate the power of computational analysis to illuminate unseen nuances, complexities, subtleties, stakes in the text. He mentions in passing, for example, that his science fiction model misclassifies Thomas Pynchon's The Crying of Lot 49 as one of its own. Scholars conventionally, and for good reason, read Pynchon's novel as a spoof on detective fiction. But what distinguishes science fiction, according to Underwood's model, is a rough sense of sublimity. "It may not be Pynchon's explicit concern with entropy," he writes, "but his paranoid fascination with the sheer scale of mass society that this model sees as connected to the tradition of science fiction."41 A dizzying, delightful reading could follow, with opportunities to advance our understanding of this novel, the genre, and the way word counts transmogrify into style on the page. But Underwood hurries on. It is a lost opportunity to prove critics wrong.

3: Realism

Underwood is realistic. Given the challenges involved in learning quantitative methods, he "would be surprised if even 2% of literary scholars undertook that commitment over the next decade."42 He believes that, if we recognize "the truly marginal position numbers now occupy in the humanities, much debate on this topic becomes laughable."43 I agree. The battle over these methods is symbolic, a proxy for arguing about what we value, our vision of what we do as scholars. Step back from the fray and the battle looks like misplaced energy. The fate of the English department does not depend, as Sarah Brouillette argues, on the outcome of this fight. Far greater forces are at play.

Cultural analytics occupies a very small place in the discipline. Its methods are entirely compatible not only with those advocated by Da, Kramnick, and Nersessian, but with the full range wielded by literary critics. Like any method, it can be put to use badly. With Distant Horizons, Underwood has shown that it can be put to use brilliantly to teach us about literary language, genre, prestige, and gender. For anyone interested in what literature looks like from the slant of cultural analytics, Underwood has set the stage for great work to come.

Dan Sinykin is a postdoctoral fellow in digital humanities at the University of Notre Dame. Beginning in fall 2019, he will be an assistant professor of English at Emory University. He is the editor of Contemporaries.

References

Ted Underwood, Distant Horizons: Digital Evidence and Literary Change (Chicago: University of Chicago Press, 2019): 145.[⤒]
Ibid., 162.[⤒]
Ibid., 150.[⤒]
Ibid., 156.[⤒]
Ibid., 7.[⤒]
Ibid., 19.[⤒]
Ibid., 17.[⤒]
Ibid.[⤒]
Ibid.[⤒]
Ibid., 13.[⤒]
Ibid., 25.[⤒]
Ibid., 26.[⤒]
Ibid., 31.[⤒]
Ibid., 107.[⤒]
Ibid.[⤒]
Ibid.[⤒]
Ibid., 106.[⤒]
Ibid., 136.[⤒]
Ibid., 107.[⤒]
Nan Z. Da, "The Computational Case Against Computational Literary Studies," Critical Inquiry 45, no. 3 (2019): 606.[⤒]
Underwood, Distant Horizons, 21.[⤒]
Ibid.[⤒]
Ibid., 42.[⤒]
Ibid., 52, 58, 89.[⤒]
Ibid., 158-159.[⤒]
Ibid., 158, 164.[⤒]
Ibid., 181.[⤒]
Nan Z. Da, Intransitive Encounters: Sino-U.S. Literatures and the Limits of Exchange (New York: Columbia University Press, 2018): 2.[⤒]
Ibid., 26-31.[⤒]
Thinking Literature, series edited by Nan Z. Da and Anahid Nersessian, University of Chicago Press.[⤒]
Anahid Nersessian, "Literary Agnotology," ELH 84, no. 2 (2017): 341.[⤒]
Ibid., 342.[⤒]
Jonathan Kramnick and Anahid Nersessian, "Form and Explanation," Critical Inquiry 43, no. 3 (2017): 39.[⤒]
Jonathan Kramnick, "The Interdisciplinary Delusion," Chronicle of Higher Education, October 11, 2018.[⤒]
Underwood, Distant Horizons, 154.[⤒]
Ibid., 150.[⤒]
Ibid., 148.[⤒]
Ibid.[⤒]
Ibid., 150.[⤒]
Ibid., 152.[⤒]
Ibid., 59.[⤒]
Ibid., 145.[⤒]
Ibid.[⤒]

← Previous Article