Combining Close and Distant, or, the Utility of Genre Analysis: A Response to Matthew Wilkens’s “Contemporary Fiction by the Numbers”

Jeremy Rosen / 12.03.11

Following in the footsteps of Franco Moretti and other recent figures at the vanguard of the digital humanities, Matthew Wilkens lays out a series of provocative challenges and recommendations for how scholarship might adequately address the profusion of cultural production in an era shaped by the "problem of literary and cultural abundance." Refreshingly candid about the limitations of applying quantitative and computational methods to cultural analysis at the present moment, Wilkens's piece includes the somewhat damning admission that such digital analysis is not currently possible, and may remain prohibitively costly for books written after 1923, in "the era of effectively perpetual copyright."

But Wilkens neglects other equally pressing problems with the computational practices he advocates—limitations that reveal themselves in the very analysis he proffers as a sample of the kind of scholarship such practices might enable. Two problems with Wilkens's method strike me as most urgent. First and most glaringly, he inadvertently demonstrates how easily data may be misinterpreted to serve conclusions that are sought by the analyst. And second, though he and others doing similar work purport to offer analysis of neutral data sets—say, all the fiction published in a given year—by working with existing bibliographies they perpetuate the selection criteria that governed the initial compilation. Doing so artificially reifies bodies of texts that might in fact be far more heterogeneous and unruly.

Both of these problems, I would like to argue, could be avoided by combining digital and computational methods with traditional modes of literary analysis. Scholars who take Moretti's polemics for abandoning close reading as an actual course of action insist on a methodological absolutism that, to quote a colleague, amounts to "working with one hand tied behind your back." 1 Further, there are many reasons to question Wilkens's fundamental assumptions: that abundance is a problem; that being able to say conclusively what the entire field of cultural production at a given moment looks like is necessary or desirable; and that we should aim for a practice that gives equal weight to obscure, unread cultural products and well-received, widely-disseminated, popular ones. Ultimately, these assumptions derive from a false dichotomy between quantitative and qualitative methods that leads the proponents of distant reading to pose it as a replacement for, rather than a supplement to, traditional close reading practices.

At the end of this essay I suggest that a flexible application of genre analysis—one attentive to the fact that generic categories are not static data sets composed of identical terms, but rather what Michael McKeon calls, pace Marx, "simple abstractions" that yoke together heterogeneous and dynamic histories of literary and cultural production—is a particularly useful methodology to be deployed in conjunction with the burgeoning field of digital humanities. 2 The particular utility of genre study lies in the way it facilitates oscillation between levels of analysis. Genre helps us access answers to large-scale questions by aggregating kindred texts, but without losing sight of the details and divergences of individual works. Genre analysis suggests that, while scholars ought to be excited about the research and analytic benefits made possible by searchable digital archives and data analysis, they needn't abandon the more supple procedures of literary and cultural study that attend to the complexity and contingency of historical phenomena.

Data and statistics can be seductive, perhaps even more so to those of us trained in "softer" disciplines. 3 But data needs to be analyzed and interpreted; arguments must be made about the meaning and significance of findings. Though Wilkens's article points to the potential utility of computational approaches, his practice ends up demonstrating how misleading data can be, how easily it may be misinterpreted to support desirable (usually surprising or counterintuitive) conclusions. The susceptibility of data to misinterpretation and inaccurate generalization are most evident in Wilkens's section on "Geolocation," where he analyzes points plotted on a worldwide map which correspond to all the proper geographic names mentioned in thirty-seven ("those available in machine-readable form from the Wright American Fiction collection") of the one hundred and eleven works of fiction published in the U.S. in 1851 (according to Wright's bibliography). From the wide geographic spread of points on this map, Wilkens disputes scholarly consensus, the "standard answers" that U.S. fiction at that time was concerned with "'New England' and 'Americanness' (understood to include issues of slavery and union)." This map, he concludes, reveals that the "imaginative landscape of American fiction in the mid-nineteenth century appears to be pretty diversely outward looking in a way that hasn't received much attention." This claim suffers from what (I hope) my students would easily recognize as a "warrant problem." The quick warrant test I offer those students is to construct the generalized if-then statement that provides the justification for the claim, and to check if that statement logically holds. The resulting warrant for Wilkens's claim would look something like: "If a set of texts mentions locations all around the globe, then the texts are diversely outward looking." It's pretty easy to see what's wrong with the logic of this claim; it determines a qualitative character ("diversely outward looking") from a piece of quantitative data, the mere fact of a place name being mentioned, regardless of the nature of that mention. If I were to say, "Gosh, I never want to go to Timbuktu," I have just mentioned a far-away place, but have hardly established myself as broadminded.

If we combine computational methods with a close look at the context in which place names are mentioned in texts from 1851—an "old school" strategy Wilkens abjures in this essay—we find evidence contradicting his claim, and deeper insights than the sweeping generality of "pretty diversely outward looking." Take for example Joseph B. Cobb's Mississippi Scenes, or Sketches of Southern and Western Life and Adventure (Philadelphia: A. Hart, 1851). If we use the Wright database to search within this text for "Africa," we find three "hits." One amounts to a piece of proverbial wisdom mentioning "Africa" and "Italy," in which the attributes of the places themselves are largely irrelevant. ("But in one respect their tactics were similar; they both believed, like Scipio, that the best way to drive Hannibal out of Italy was to carry the war into Africa" [152].) This particular search result shows that some place mentions may not convey any information whatsoever about American attitudes towards the locations mentioned. Another hit is far more illuminating, and dramatically at odds with Wilkens's claim. Cobb writes of the folk tales beloved by slaves:

All these stories were caught, and of course spread, with every imaginable exaggeration, by the negroes belonging to the various farmers around; and bending their whole active and magnifying fancies to the welcome task, these credulous and wonder-loving sons of Africa would charm and excite their masters' children with tales of Jack-o'-the-lanterns, and swamp owls, and whippowils, all of which were, with them, beings of speech and thought" (98-9).

Here "Africa" is clearly used—in a book that proffers an apology for slavery as practiced by, to Cobb's mind, more enlightened Southerners 4—to refer to the place from which American slaves trace their ancestry, and, less superficially, as a way of explaining the racially ingrained credulity of those slaves, which both ingratiates and likens them to their masters' children. Such attitudes hardly qualify as "diversely outward looking"; indeed, they reinforce the conventional wisdom that texts of the time were preoccupied with "issues of slavery and union."

To say close reading of the passage above reveals more than Wilkens's analysis of the GeoDict maps, however, is to fall into a zero-sum comparison of literary methods, and to incorrectly pose these methods as alternatives—to act as if we have to choose one or the other. Although Wilkens argues for letting "computational methods guid[e] our attention to works that seem likely to repay close study," he does not actually employ such close study in the process of making his claims. Nor does the relationship between close and distant reading need to flow only in one direction. For instance, Wilkens also concludes, from the many points on the map in the southern U.S., that "[p]erhaps we need to at least consider the possibility that American regionalism took hold significantly earlier than we usually claim." But if we don't mind a little reading, we hardly need to do more than survey Cobb's title to see his regional motive at work. As I hope my brief interventions here suggest, the invaluable resource of digital archives and the utility of searchable databases can be most rewarding when deployed in concert with close reading, archival research skills, and careful argumentation. The availability of voluminous electronic data, one might conclude, makes it even more necessary to cultivate the faculty of analyzing data closely and critically.

Wilkens's examples also demonstrate the way computational methods, by relying on existing bibliographies and databases, can reproduce and reify the often unstable, contingent generic categorizations established in the initial cataloguing of such texts. To return to the example above, Cobb's Mississippi Scenes, or Sketches of Southern and Western Life and Adventure is catalogued in the Wright American Fiction database. Wilkens thus understandably uses the text (transformed into a data point) to make claims about the field of "American fiction circa 1850." Cobb's book's status as fiction, however, is questionable; the volume is, in fact, a fine example of generic instability. Cobb himself calls it a collection of "sketches," insists in the introduction that they are "drawn from real scenes and characters," and promises the "truth and accuracy of the description," while admitting they are "embellished" (v, vii). Cobb claims he writes "as a journalist" (vii), and many of his sketches were originally published in newspapers. One might thus argue that the book is more akin to a collection of essays than to a work of fiction. Of course, one might at this point ask why we should be only or predominantly interested in works of fiction, anyway. Wilkens, while arguing against privileging certain high-profile texts over others, and for considering cultural production in its entirety, writes that the thirty-seven texts mapped "represent about a third of the total American literary works published" in 1851. Wilkens thus errs in two directions with regard to the genre of Cobb's book: he accepts the received (i.e., Wright's) categorization of the book as fiction, and he assumes that the universe of published literary works is more or less continuous with the number of works of fiction.

My point here is less to disagree with Wilkens, however, than to call attention to the fact that computational research must necessarily perform its data-crunching on some delimited corpus. Digital scholars often work with neatly drawn, existing catalogues of texts, attributing to every work in a given category equal numerical participation in it, despite the fact that questions of categorical instability are both quantitatively significant and a potential source of qualitative interest and contestation. 5 Is Cobb's book a memoir, a book of essays, a novel, or some unique hybrid text that defies such either-or categories? How does the fact that such "sketches" were initially published in newspapers affect our understanding of what constituted a newspaper in 1851, or such publications' relative investment in reporting versus entertaining? Of all the cultural production in that (or any) year what was, and what ought to be considered, "literary"? Only fiction? (Poets are already screaming.) All imaginative writing? Anything written with conscious attention to form? Anything written at all? These are of course the kinds of questions that have animated literary theory and cultural history in recent decades, and working with extant bibliographies—particularly old ones—risks eliding distinctions and drawing artificial boundaries over the gray areas that scholars in fact see as significant sites of critical debate.

Such risks become even more evident when Wilkens tries to determine the frequency with which different parts of speech appear in allegorical texts. Wilkens writes that such a project requires "measuring all of these parameters and then seeing what combination of them does the best job of replicating our settled judgments about the allegorical content of known texts." But is there broad scholarly agreement about the quantity of allegorical content of a given text? Wilkens admits that "[w]hat's required" for such a project "is a way to measure quickly the degree of allegory in any given text." He does so by pairing texts by a given author and comparing the allegorical content in them, ranking Alcott's Little Women higher (no scale given) than Work, Defoe's Robinson Crusoe and Moll Flanders higher than A Journal of the Plague Year. While there may be good reasons to argue for these rankings—though Wilkens does not offer them—I would venture that for many literary scholars apprehending "the degree of allegory in any given text" presents a particularly slippery task. For some texts like Pilgrim's Progress or Animal Farm, which draw on extended personification and continuously create meaning at two or more levels, readers would likely agree to rank the texts as "very" allegorical. But for others, say Little Women, how many would be willing to say the novel is principally allegorical? Which level of meaning predominates? Ann Douglas argues that Alcott's best known book is "in part an allegory," but categorizes it primarily as a "a fictionalized version of that now forgotten, once recommended literary phenomenon, a family journal, the moral saga of an entire clan" (48; 43). The point, again, is that establishing a text's fit in a given generic category requires interpretation. But the computational analysis that purports to be interested in determining the allegorical content of a large corpus of texts only becomes possible after accepting wholesale some prior determination of the allegorical content of an initial one.

Moreover, there are good reasons to question Wilkens's overarching premise that there is an urgent need for scholars to be able to attend to the vast quantities of contemporary cultural production in its entirety. I contend that this assertion is founded on a pair of interrelated, mistaken assumptions: first, that it ought to be a goal of scholarship "to say with any certainty what contemporary cultural production as a whole looks like" (my emphasis), in other words, that analyzing more texts, or even, in a fantastic scenario, everything produced in a given moment, would be necessary or desirable even if it were possible; and second, that "the cultural turn" demands that we give equal attention to (or consider as a proportional quantity of available data) obscure texts that might seem to be marginal or of little importance.

First, even if all books published in a given year were digitized, searchable, and freely available online, we would still be making selections, not analyzing the whole of cultural production. We'd be processing the data of all published books, but ignoring those that went unpublished, as well as digital texts and products in other media: music, film, visual art, performance, et cetera. Wilkens strives to be "inclusive," checking himself when he talks about the paltry "number of books we read," and modifying that formulation to encompass "the objects we study." But it should be easy to see that the more inclusive our sense of the object of our study, the more impossible the task of even determining what counts as data becomes.

But why should we think determining the character of "contemporary cultural production as a whole" is desirable anyway? Try saying something meaningful that encompasses all of the paltry couple dozen books you've actually read this year. Now make sure this statement also covers all the movies, music, and art you've consumed. What could we imagine saying that would be true of everything produced in 2011, or in the last decade? Digital humanities doesn't benefit from the suggestion that it might offer the key to all mythologies, and we ought to question how blunt an instrument we are willing to accept in return for encyclopedic scope.

The injunction to weight all cultural products equally is equally fraught. Wilkens argues, in effect, that if it's culture we're interested in, we'd better consider all of it: "once we've taken the cultural turn, it's hard to see exactly how we can justify not only excluding material on the basis of perceived aesthetic value (our job certainly isn't any longer to say what's beautiful and what isn't), but to do so without having been so much as aware of its existence in the first place." But it is actually fairly easy to justify "excluding" some objects, or rather, determining that they are less significant than others, even if we do not grant the proposition of intrinsic aesthetic worth. We simply have to ask: who perceives the material in question to have value? Does that person's opinion matter—in other words, does that person wield the symbolic power to make others assent to his or her judgment? One of Bourdieu's central contributions was to insist that symbolic struggles have real effects, that symbolic value is convertible to economic value, that perceptions of aesthetic value are exercises of power. "Symbolic power," he writes, "is the power to make things with words" (Bourdieu 1989, 23; his emphasis). The justification, then, for concentrating on some works "on the basis of [their] perceived aesthetic value" is not that such works actually possess the immanent value attributed to them by those who wield the symbolic power to consecrate and legitimate. The justification is that those acts of consecration, when made at the sites of institutional power, have real results, causing, for example, some works to be far more widely read and disseminated than others, and in the process acquire a range of influence and social significance.

The idea that the authors that literary agents choose to represent; the manuscripts that editors choose to buy; the books that publishers back with advertising resources, that bookstores place prominently, that consumers purchase, or that professors assign are not more socially or culturally significant because the aesthetic ground on which they are perceived to be valuable has shifted or been deconstructed, revealed to be ideological, is simply false. Even if we've taken the cultural turn, that is, we are not obliged to weigh every object produced in a given culture equally. Insofar as we care about what actual people are reading or viewing or listening to, the effects cultural texts have on their readerships, the way widely-read texts influence other producers, and what the success of certain texts says about the beliefs and desires of that society, we ought to weigh far more heavily the 0.1% of blockbuster texts. We should not reduce each one to numerical equivalence with any other single data point. Timeless classics or bestselling texts, in this understanding, are not to be studied because of their self-evident greatness, their transcendent immanent qualities, but because for better or worse they have achieved a position of cultural centrality, been widely circulated and held up as models. We should think the Harry Potter or Twilight books and films are more culturally significant and make a stronger claim on our attention, even, or perhaps especially, if we find them anodyne. Likewise, books or other cultural products with marginal audiences may be of marginal interest—unless we take it upon ourselves to articulate reasons why they command a quality of attention that is entirely disproportionate to the amount of notice they have garnered. History is of course full of examples of artists and works that are discovered, rediscovered, or reevaluated—championed by some later generation of audiences or critics. Such projects demand articulations of significance, assertions about the qualitative character of works that establishes them as meriting study, and efforts to place them in historical and generic contexts that help make sense of their relations to, and divergences from, the larger "data set" that is the entire field of cultural production.

My recent work on the vibrant contemporary genre I call "minor character elaboration" is an example of excavation and analysis of a largely unnoticed phenomenon, and it demonstrates how genre study can pursue large-scale questions while attending to the anomalies that individual cultural objects often represent. Over the past three decades, dozens of writers have followed the model of well-known texts like Jean Rhys's Wide Sargasso Sea (1966), Tom Stoppard's Rosencrantz and Guildenstern Are Dead (1967), and Aimé Césaire's Une Tempête (1969) by converting minor characters from canonical literary works into their protagonists. Contemporary writers like Margaret Atwood (The Penelopiad, [2005]) and Alice Randall (The Wind Done Gone [2001]) have deployed this genre to critique and imaginatively repair the absence of subaltern perspectives in canonical texts. Meanwhile, the genre has also become a proven form of highly conventional genre fiction. Gregory Maguire's Wicked (1995) remains the most wildly successful example, but others include Christopher Moore's Fool (2009), which retells Lear from the jester's perspective, and Geraldine Brooks's Pulitzer-winning March (2005), which centers on the eponymous father from Alcott's Little Women. Aggregating a diverse cluster of texts that also includes John Updike's Gertrude and Claudius (2000), John Clinch's Finn (2007), and John Scieszka's The True Story of the Three Little Pigs, by A. Wolf (1989), I am able to access answers to broad questions, such as: why has the consolidated global publishing industry seized on this genre as a vehicle for its wares, and what does the cooption of revisionist textual strategies by the culture industry reveal about the purportedly oppositional nature of those strategies? Space constrains me from doing justice to these questions here, but viewing this genre as a heterogeneous set—that is, as an empirical rather than ideal genre—I can single out divergent members, making sense of their particularities. The idiosyncracy and significance of a book like J.M. Coetzee's Foe (1986), for example, becomes clearer when juxtaposed with the conventional procedures of the genre, since the comparison allows us to see how Coetzee maintains Friday's voice as an absence that the novel cannot repair. Genre study thus oscillates between levels of analysis, training its vision on a constellation of objects then telescoping in for a closer look. Close and distant can be ways of looking that reinforce one another. Perpetuating the false dichotomy between them forecloses a whole range of differently-scaled modes of analysis—from elucidation of the manner in which publishers market the symbolic capital of canonical works, to scrutiny of the ways a given text diverges from a set of generic conventions—that subsist between close reading and data mining.

Jeremy Rosen received his Ph.D. from the Department of English at the University of Chicago in 2011, and currently teaches there as Lecturer in the
Humanities. His research explores the intersections of contemporary genre fiction, multiculturalism, and the global publishing industry.

Works Cited

Bourdieu, Pierre, "Social Space and Symbolic Power" Sociological Theory 7.1 (Spring, 1989), 14-25.

---. Language and Symbolic Power, ed. John B. Thompson, trans. Gino Raymond and Matthew Adamson (Boston: Polity, 1991).

Cobb, Joseph B. Mississippi Scenes, or Sketches of Southern and Western Life and Adventure (Philadelphia: A. Hart, 1851).

Douglas, Ann. "Introduction to Little Women," in Little Women and the Feminist Imagination, eds. Janice M. Alberghene and Beverly Lyon Clark (London: Routledge, 1999).

McKeon, Michael. The Origins of The English Novel, 1600-1740 (Baltimore: Johns Hopkins UP, 2002).

#1 Thanks to Joshua Kotin for passing on this piece of wisdom from David Galenson.[⤒]
#2 Michael McKeon, The Origins of The English Novel, 1600-1740 (Baltimore: Johns Hopkins UP, 2002), 162.[⤒]
#3 I refer here to the readership of Post-45, not to Wilkens who is trained as a chemist.[⤒]
#4 Cobb, for example, depicts a "poor old negro woman" whose only son is purchased by a planter from a distant state, and chides the slave trader who "with a mistrust too common amongst slave-dealers, adopted the severe and repulsive precaution of manacling his hands with stout irons forged for the purpose, and then confining him, by means of a strong rope, to the neck of his horse. It is this unwarrantable and useless severity...which give to the enemies of our institution such room for rabid exaggeration...In the transfer of these unfortunate people...a kind look, a benevolent expression, a single word of encouragement or sympathy, rarely fails to reconcile them in a moment to their altered lot, even when family connections are dissevered" (93).[⤒]
#5 Moretti also works with existing bibliographies, and thus concludes that the genres he graphs flourished for a period and then died out, were replaced by others. He employs, that is, a synchronous taxonomy to chart a diachronic phenomenon. But if we don't adhere to those bibliographies—that is, if histories of reception evaluate and categorize texts in a different manner—we might find, for example, that gothic elements persist in later works, and then the entire data set becomes unreliable.[⤒]

Combining Close and Distant, or, the Utility of Genre Analysis: A Response to Matthew Wilkens’s “Contemporary Fiction by the Numbers”

Related reading

P45