For the past couple of years amazon.com has been including a feature it calls “text stats” on many of its book pages. Among the statistics presented are “readability calculations” that estimate “how easy it is to read and understand the text of a book.” But there is also more raw data, including stats on the percentage of complex words (however that is measured), the number of syllables per word, and the number of words per sentence. For example, Alejo Carpentier’s The Harp and the Shadow (my translation, with Carol Christensen) scores in the 13th percentage for word complexity, a low 1.6 syllables per word (hard to believe), but a whopping 39.1 words per sentence, or one and a half times as much as Faulker, putting it in the top one percent of all books in the amazon sample.
I was curious to see if this feature could be used to identify any trends over time. My first thought was to compare best sellers across the years, but I quickly abandoned that idea as the list of books was simply too boring. Instead I chose Pulitzer Fiction Prize winners at five-year intervals, beginning with 1950 (the fiction prize was first given in 1948). Statistics were not available for all of these books, so I had to substitute by going a year forward or back in some instances. Here is the list of books I used in my test:
1950 The Way West by A. B. Guthrie, Jr.
1955 A Fable by William Faulkner
1961 To Kill a Mockingbird by Harper Lee
1965 The Keepers of the House by Shirley Ann Grau
1969 House Made of Dawn by N. Scott Momaday
1976 Humboldt’s Gift by Saul Bellow
1980 The Executioner’s Song by Norman Mailer
1986 Lonesome Dove by Larry McMurtry
1990 The Mambo Kings Play Songs of Love by Oscar Hijuelos
1995 The Stone Diaries by Carol Shields
2000 Interpreter of Maladies by Jhumpa Lahiri
2005 Gilead by Marilynne Robinson
(For Faulkner’s Fable I had to substitute an edition called Novels 1942-1954.)
The results were interesting. One might assume that the style in fiction has been toward simpler language, but this is not what I found. For example, although the differences in word length aren’t great, it would appear from the results that words in fiction — at least, the fiction that wins Pulitzer Prizes — are clearly getting longer.
Similarly, there is a clear increase in the number of complex words.
The trend in sentence length is unclear. There is a big spike with Faulker in 1955; otherwise, there may be a slow increase in this category as well.
Does this prove anything? Not really. My sample is very small, and a slightly difference choice of books might find something completely different. Moreover, the category of Pulitzer Prize winners is obviously a minute fraction of overall fiction, and Pulitzer judges might deliberately resist overall trends. More research would be welcome. At the same time, it is suggestive to find that in all three of the amazon statistical categories the trend in this sample of fiction has been to greater complexity and length over the past half century and not the opposite as one might have guessed. I would be interested to hear other opinions.