Rise of the Novel 2018
Assignment 8: Word Choice & Fictionality

When we compare Austen to Brunton, we see that hers were prosaic words that we continue to use today (“dining,” “remarkably,” “bye,” “thorough,”), while the words that Brunton uses seem very Romantic in comparison (“dreary,” “wretch,” “heartless,” “beseech,” “piety,” etc., etc...).  This trend continues in the comparisons between Austen & the canon and between Austen and Chawton. Austen’s words are light, playful and modern-sounding while those that other authors use are grounded in medieval-period nostalgia. Particularly in Chawton we see class divisions that Austen circumvents, from “domestics” to “lords,” “duchesses,” and “earls.” Other contemporary novels are about “chivalry,” “knights,” “wounds,” “saddle,” — large-scale, over-the-top feudal drama. Austen writes of flirtation and friendliness, the kinds of things that humans do regardless of class or era. 

 Gallagher draws a contrast between premodern romances, myths, poems, etc., which “may be described in the modern era as drastically suspending or altogether modifying ‘the referentiability of certain claims’” (338). She writes that societies are fictionally sophisticated when they cease to say that verisimilitude intends to deceive the reader. The grand romances that we have compared Austen to with the CS21 labs make their fictionality transparent by using exaggerated verbs and adjectives and heroic, powerful figures, leaving out small details. Austen, in contrast, does not attempt to signal nonreferentiality, as words like “mantelpiece,” “apples” and “handwriting” are the kinds of concrete things that Barthes says remind us of reality. Thus she creates the “mental space of the daily life” of which Jameson writes.

I wanted to see if I could make the “most frequent words” feature more informative, so I found the list of stopwords that Voyant uses, then read it into a list in I then used this list to remove stopwords from the results (after asking the user if they want stopwords or not). (This list of stopwords in itself interesting). This took me a teeny bit longer than I’d like to admit so I’ll try to post more on results later; here's the link to my revised program if anyone wants to try it out. You have to tell it to take out stopwords for both texts right now. Save both files in the bin. Main finding: Mary Shelley uses the word “miserable” a lot more than she uses the word “happy” in Frankenstein.

Looking at these texts in comparison and looking at many at a time seems helpful for drawing the kinds of historical conclusions our critics draw. Habermas, for example, states confidently that literature in the 18th century was concerned with marriage. Looking at these results, it is clear that literature does, in fact, care a lot about love and motherhood and marriage.


While I was able to learn about the number of books published in different years and to ascertain a little bit about the prevalence of different narrative forms in the 18th century, I felt limited in what I could do with this dataset using Fusion Tables. First of all, I found it very difficult to draw conclusions about changes over time because, as the writeup mentions, you can only see the count of books that meet a certain criteria rather than the percentage of books published at a given time that meet that criteria. So when I went to see how the use of epigraphs trended over time, I couldn’t discount the possibility that the trends I saw were simply because the overall count of novels increased (especially considering that the epigraph trend very nearly mimicked the count trend). I think Voyant is much more useful for those kinds of inquiries.


 I also think that it would have been interesting to see the trend in authors’ use of prefaces — which we would expect to decrease as the public grew more comfortable with fictionality— but the “paratext titles” were all grouped together for each work, so you couldn’t separate the table of contents from the preface. This was the same for title adjectives — I found the category very interesting, but couldn’t find out much about them because while it’s likely that two titles will share one adjective, It’s unlikely they’ll share all 4 or 5 of their title adjectives. However, using a network graph, I got somewhat of a sense of the usage for each narrative form. If the data was separated for each title adj., say in a set that compared only the title adjective with the narrative form of the work, I think this tool would be very interesting. You could see which terms were shared by epistolary works and other first-person works as well as which ones were not shared, etc. The map was also interesting because I did not know that so many works were published outside of London.



I thought the word cloud that I made was the most interesting finding using this dataset. I chose to look at title adjectives, which I think is a great parameter for a dataset of novels. Again, I am missing Voyant’s capability to clean words from the word cloud, but I thought it was interesting how much the titles attempted to sell the novel, and in what way. Apparently, making sure the reader knows your protagonist is young and female was important; the word "male" did not show up as often and "old" certainly didn't show up. I also think it's funny that "moral" and "historical" competed with "interesting" and "entertaining" (entertaining is the winner, in terms of usage). The word "secret" was also popular -- here we see the interest in the private workings of the young female mind, I guess. I would definitely like to see historical trends on which of these words continued to be used; I could easily use Voyant to do that!


Assignment 4 - Tristram Shandy

I worked with Chapter 5 in Tristram Shandy. The main issues I ran into included s being turned into f or { because of the font. As for the latter, we could default { into s. For the former, we could take the words the OCR spits out with f in them and run them through a dictionary. If they are words in the dictionary, the f stays and if not, the computer converts it to s. Using a loop in the programming, we could do this for all combinations of letters if there are multiple f’s in a word. Additionally, w almost always came out as vv or yv. Since these letter combinations are much less likely than w, w should be the default. The same is true with m being converted to rn and in (although these letter combinations are more likely). Therefore, I think going back to running the output through the dictionary for words with letter combinations like this would be a good idea. (In fact I think running all words through a dictionary would be a good idea and flagging those that don’t show up to be checked would be good. However, this could take a lot more time and would require someone to look through the flagged words). As for the names in italics that didn’t translate close at all, I don’t know if there would be a way to either train the OCR to be better at recognizing such texts or another tool that could be used to get at least more of a semblance of the word in the translation.


In translating a piece to plain text, however, we lose some of the stylistic elements chosen for the book, such as an enlarged first letter, or the syllable at the bottom of the page to help in reading aloud (which may not be missed by readers now, but is an artifact of the culture at the time that sheds light on the period). Also lost are varying text sizes (like we saw on the title pages) as well as images and the effects the font itself might have (emotional or cultural, such as the s that appears to be an f- or at least so says the OCR). In terms of symbolism, also, there is something lost by turning the image (as its own piece of art, chosen by authors and editors and printers) into a plain .txt file or Times New Roman and thus from art (and a physical commodity) into merely another computer file. Whether these losses are important, however, remains up for debate and likely depends on the manner in which the texts are used. For example, while stylistic elements may be lost, other elements may be gained by turning it into text. For example, text can be used in different ways, such as through Voyant for analysis. It also allows for changing font or size to increase readability for those who are visually impaired or otherwise wouldn’t be able to read.

If we select the texts Pamela (1740), Pamela II (1741), the Anti Pamela (1741), Shamela (1741) Joseph Andrews (1742) , Clarissa (1748), and The History Of Charles Grandson IV (1753), a trend starts to emerge that both explains the strong response Pamela provoked and reveals a certain irony behind it. If we examine the most common gendered terms across these texts–– Mr. and Lady respectively––we see that with the exception of Shamela and Anti-Pamela the use of the later steadily increases in relation to the former across time. Furthermore the use of the word “miss” also increases (although less steadily), suggesting that, in addition to centering more women, these texts are, in particular, focused on unmarried women. What is particularly ironic is that Fielding’s work is not entirely immune to this trend as Joseph Andrew’s features a nearly equal usage of the terms “Lady” and “Mr”.
These trends in word frequency also have interesting implications with respect to Armstrong’s claims that, “the female was the figure, above all else, on whom depended the out come of the struggle among competing ideologies” (468), the ways in which these terms appear in the texts are at the very least evidence of that centering.

Words of Religion in "Pamela" and Its Fellows (Assignment 3)

Something I am curious about, in regard to Pamela, is the presence and function of religious terminology in the text. While examining the word cloud generated by Voyant for the novel, it struck me that only two evidently religious terms--"god" and "pray"--appeared in a cloud of 55 words (with auto-detected stopwords). Moreover, both of these words were relatively small and disappeared when I restricted the cloud to 25 words. Is it not odd that a novel purporting to instill religious virtue in its readers does not make more extensive use of explicitly religious words?

I investigated this theme further while comparing the seven corpuses (corpi?), using the Trends tool to examine the relative frequencies of four religious words: "god*," "pray*," "religion*," and "virtue*." A few observations from the graph of that comparison (which I had trouble including in this post): "God" appears, as might be expected, far more frequently in Pamela than in Anti-PamelaShamela, or Joseph Andrews--but also more frequently than in Clarissa or Grandison. "Pray" follows a similar pattern but actually appears quite frequently in Shamela. The frequency of "virtue," interestingly, does not vary much among the texts. And, somehow, the frequency of "religion" is seemingly negligible in all of the texts except for Shamela, where it appears relatively often! What do these observations tell us about how Haywood and Fielding went about writing their satires--particularly in relation to how Richardson constructed Pamela and his other works?

Exercise 3: Indexing Pamela - Frequency of Words

While working with Voyant, I was struck by the novel’s little mentioning of Pamela’s value of her virtue, as well as the factors of religion and honor associated with that mindset. From the title page, Pamela; or Virtue Rewarded, Richardson demonstrates to the reader the possibilities and benefits of maintaining virtue, despite environmental pressures. However, in looking at the number of times the word “virtue” was used in Pamela, I found that the word was only heavily used in the first segment, much of which is counted due to its use in the title and our use of a clean Text document. From there, the use of the term significantly decreases, only returning to more use in the final segment of the novel. Also, some of the support and reasoning behind her prioritization of her virtue, such as religion and honor, are not mentioned as much. Looking at the ranking system, God, Pray, Honour, Honesty, Virtue, and Duty are ranked 32, 38, 44, 118, 166, and 210 respectively. Meanwhile, terms that reduce her strength and autonomy as a character, such as her use of the word master more than even her name, can imply dominance over the character. Thus, Richardson risks reducing the assertion and impact of such morals on the reader.
In looking at the word usage in other novels, I found that one possible method the spoof authors intended to reduce the dynamic of Pamela was by subtly not acknowledging her ability to change social status and the importance of her virtue. From Pamela to Pamela II, the use of the word lady significantly increases, emphasizing Pamela’s success to rise above her social status thanks to her resilience towards maintaining her virtue. The novels Sir Charles Grandison and Joseph Andrews use the term more than Shamela and Anti-Pamela. In addition, those parody novels mention virtue less than Charles Grandison. Thus, we can the authors subtly attempting to reduce the impact of Pamela by not acknowledging her ability to overcome societal and class norms. However, in both cases, the question remains whether the readers were able to pick up on the subtle used of words like lady and virtue by the authors to get their points about Pamela across. 

Indexing Pamela et. all

When I read through Pamela's 1742 Table of Contents, I was struck by a trend in the text's syntax—while the sentences contained in each entry of this list (an example of the list form which differs from Crusoe's cataloguing of things, since it's more interested in emotional and physical events) are necessarily fragmented, shortened for the lazy or preoccupied reader who wants a broad understanding of what Pamela is without having to slog through hundreds of pages, this shortening is achieved by removing as many names and pronouns from each summary sentence as possible. The reader is left with phrases such as "Continuation of her story. Her irresolution what to do. Desires Mrs. Jervis to permit her to lie with her: And tells her all that has passed. Mrs. Jervis's good advice," (ii) phrases which can be consumed at a rapid pace but lack much of the intimate quality that Pamela itself, with its first-person narrative mode and interest in both interiority and interpersonal dynamics, possesses. This erasure of pronouns is most drastic in the case of "I," which entirely disappears from the Table of Contents in favor of a shift to the third-person; "she," "he," and the names/titles of characters remain, but are dramatically fewer. 

I'm still formulating my thoughts on what this decreased emphasis on the human subject in summary sentences signifies about Pamela, and trying to figure out how an intimacy with the reader is so easily manipulated by something as simple as a pronoun's presence or absence. I'm reminded of Armstrong's thoughts on how domestic fiction prioritizes individual character over societal role, and of the way that Gallagher talks about the close (but not too close) relationship which fiction encourages between its readers and its characters, but I'm not quite sure whether pronoun use is consequential enough to have a strong connection to these theories. The shifting presence of pronouns in Pamela and its associated works nonetheless struck me deeply when I used the Voyant text analyzer as well, because Voyant does the same thing that Richardson did in his introduction—it tries to cut the pronouns out, automatically designating "she," "he," and "I" as stopwords and strangely (this admittedly may have been a glitch) allowing "she" to be searched for in the Document Terms window but not "he" or "I." When I messed around with the Reader window once I'd uploaded the full corpus, I was able to graph the relative frequencies of pronouns in the various books; those graphs are included below. Pamela's first edition has the highest use of "I" followed by Shamela, reflecting the confessional and self-interested style of the former which was satirized by the latter, but the corpus' interest in the first person dwindles from there as the use of "she" and "he" rises. I was surprised by how similar the "he" and "she" graphs appeared, since I assumed that this corpus' focus on domestic fiction and gender would give one gendered pronoun preference over another, but upon reflection it seems reasonable that novels which are so interested in the categories of femininity and masculinity both would use the two pronouns proportionally. 

I'm still working to complicate my thoughts further from "this is important and I'm not sure why but it is a thing that happens," so I'd be interested to hear what (if anything) other members of the class noticed about pronoun use, reader-character connection, and gender in their experience of this exercise and this corpus as a whole!

Pamela vs. Shamela: An investigation of "feminine" word choice

Armstrong makes a central claim in "Desire and Domestic Fiction" that the reproducibility of Pamela's domestic themes and sentimental language were crucial to creating the modern novel because, as she argues throughout, they made space for the individual outside of political rank. "Novels early on assumed the distinctive features of a specialized language for women," she writes (472). "Inasmuch as his masculine form of heroism could not be reproduced by other authors, we cannot say Crusoe inaguarated the tradition of the novel as we know it" (471). With this in mind, I wanted to find out which words made up this "specialized language" and how Pamela's parodies and sequel reproduced it. I found out two things that seem to go along with Armstrong's argument. Firstly, "feminine" language such as "kiss," "feel," "desire" and "sweet" appears as often as or more often in Shamela than it does in Pamela--showing us that Fielding understood what was distinctive about Pamela and capitalized on it.

But on the flip side of this, we see how Fielding avoided language that conveyed some of Richardson's messaging about class and moralism: many kinds of political/economic/religious language appear much more in Pamela than in Shamela (except servant, which he mentioned often, probably to highlight the threat of class solidarity; that's for a later discussion). 

There are a few conclusions we could draw here. One is that it is true, as Armstrong says, that Pamela was very reproducible because of its language and content. But the more interesting conclusion we could possibly come up with is that she may have been reductive of Robinson Crusoe when she said its language prohibited it from conveying the kind of interiority that is distinctive to the novel, because he talks about God and himself about as much as Pamela does. 


For this assignment, I chose to focus on the Reader capability of Voyant. What I looked at was the way in which the word “bosom” was used throughout Pamela. I was surprised by the low usage (in comparison to other key words) of words like bosom, saucy, hussy and slut. These are the words which stood out to me most while reading the actual book, probably because of their personal connotations, and so I was surprised to see how sparingly they were used. “Bosom” was only used 31 times in the entire text.
Using the reader function I looked at every time bosom was used. Over the course of the book, we see a transition in how bosom is used in its symbolic meaning for Mr. B and Pamela. For Pamela, the bosom is originally tied to her letter writing, her soul and emotional state. The letters being her thoughts and feelings, and the bosom being where she stores these. The phrase or concept of putting a “letter in my bosom” (or writing supplies – the ones Ms. Jewkes gives her) was used 10 times. Ms. Jewkes kept her own instructions from Mr. B in her bosom, as well as Williams keeping the letter from Pamela within his bosom. Pamela also hides her letters for Mr. Willaims in the” bosom of the earth.” This phrase tapers off in usage as the book progresses, parallel to Pamela’s developing relations with Mr. B. Pamela goes from hiding letters in her bosom to performing actions like, “hide your [Pamela’s] dear face in my [Mr. B’s] bosom,” describing Mr. B, “generous bosom,” and finally, “hid my blushing face on his bosom.” Pamela’s relationship with her bosom alters over the course from hiding her own feelings within it to interacting with the bosom of Mr. B
Gradually as the book professes and Pamela develops her relationship with Mr. B, she begins to refer to the bosom of Mr. B rather than her own, her bosom no longer being used to keep her letters but to keep the sentiments and feelings of Mr. B.
Mr. B also undergoes a bosom change. What starts as a purely physical assessment of the bosom becomes, for Mr. B, a more emotional and conceptual understanding of the bosom. The phrase “hand in/on my bosom” was used three times, all in relation to Mr. B and Pamela. He says, “I tell you the truth in one instance, you may believe me in the other. I know not, I declare, beyond this lovely bosom, your sex.” But Mr. B’s conception of the bosom as something purely physical gradually changes as he comes more sentimental and invested in Pamela’s emotional state. We begin to see bosom used in manners such as Mr. B asking Pamela to hide her face within his own bosom, acknowledging Pamela’s bosom in a different context, “Now, Pamela, judge for me; and, since I have told you, thus candidly, my mind, and I see yours is big with some important meaning, by your eyes, your blushes, and that sweet confusion which I behold struggling in your bosom, tell me, with like openness and candour, what you think I ought to do, and what you would have me do.” He also makes allusions to his own bosom as an emotional source, “Ay, that, my dear Pamela, said he, and clasped me in his arms, was the kind, the inexpressibly kind action, that has rivetted my affections to you, and obliges me, in this free and unreserved manner, to pour my whole soul into your bosom.”
Bosom is a conduit for emotional capacity and change amongst the two main characters of the book, an observation which is made easily studable through the Voyant program.

Where I found this analysis especially interesting was in comparing the different texts in our corpus. While I knew from the start that these were all different works, subject to different results under analysis, I was still shocked to see how different they were. One this that stood out immediately to me was the prevalence of the word "said" in Pamela. While the word was used frequently in all works, it was used about three times more in Pamela than in any other. After further investigation words like "says" appear to be replacing it in other texts, especially in Shamela. In Shamela the use of "says" over said appears to be to make the protagonist appear less educated, like in the constructions "says she" vs "she said".
I also found interesting the frequency at which the main character's name was mentioned. In Pamela the word "Pamela" appears in the top 25, but is beaten by "Master". On the other hand, in Anti-Pamela, the main charater's name (Syrena) is the most used word in the entire text, without a single other character name in the top 25. This could be indicative of a difference in writing style or a difference in tone. In Pamela, the work primarily focuses on her own perception of her master, fixating on him. In Anti-Pamela, the text may spend more time focusing on its main character and less on the master. It is difficult to tell without reading the text whether this is true, but it points to a possible difference in character focus between the texts.

Assignment 3

When I saw that the word "think" appeared 505 times throughout Pamela, I was reminded of Armstrong's claim that female characters could safely act as vehicles for subversive political thought, as well as our general discussion about this period's budding interest in the interiority of people. Upon further investigation, I found that Pamela most often employs the word "think" to describe her own thoughts. This may have been obvious, considering she is the narrator and is writing about what is happening to her. It makes sense that she's sharing her thoughts. But that's from the perspective of a modern audience accustomed to this focus on interiority. At the time, to simply look through a history of Pamela's mind, her private feelings and musings on her situation, must have been just as alluring as the scandalous events that transpire within the story. Perhaps this is another way that Pamela was meant to be a vehicle for subversive ideas, beyond just her womanhood: she is not pushing the ideas on us intentionally, they are merely part of the content of her private correspondence, thus they do not feel so much like an agenda on the author's part. Furthermore, our position as the audience is that of voyeurs at best, intruders at worst, a position that becomes more salient when it's revealed that Mr. B has done the same thing we're doing. Thus, we may be more sympathetic or at least judge the ideas presented less harshly, given that we are not supposed to be seeing them.

I spent a decent amount of time playing with the correlation tool for both the corpus and the Pamela text alone. I was to hoping to try and correlate gendered titles with other words that could give me some context for Armstrong's arguments. Instead, I slowly realized the futility of the task that I had laid out. At first glance, mrs. has nice P-values when correlated with jealous, civil, crying, religion and love in the corpus. However, the sample size is surprisingly small with under two hundred hundred entries across the entire corpus for almost all of those words. Additionally, the correlations themselves have rather insignificant values. At this point, I dove through the documentation to understand what the correlation tool was actually doing. The tool, rather than doing a type of weighted distance analysis between words, was actually just comparing frequencies across the texts. Specifically in the corpus, the tool was simply comparing frequency trends between words with the indices being the documents itself. This type of analysis is never going to tell me much besides individual word frequency trends across documents, especially with only seven documents, and is not really what I was looking for. I think it would be interesting if correlations were more sentence specific, where I can instead look at words that frequently appear near other words, in a non-trivial sense (she, said (that would be trivial)). I looked at the same thing within just the Pamela source text, and instead of frequencies being divided by text, they seem to be divided by "segments," which I can't really find a definition for. More generally, my methodology doesn't make much sense because of the first person narration in Pamela, looking at a word like Mrs. is only theoretically going to give me correlations with Jervis and Jewkes. However, I did get interesting results when I looked at the word Pamela, as she is referred to almost always by other characters. I got strong P an R values with words like grave, design, girl, and loss. I'm not really sure how to deal with those words, especially design, but I suspect conversations involving a character calling her Pamela also involve said character referring to her gender in some form or another. Girl is not an insignificant sample either, with 169 mentions. I do still wish the segment form was replaced with something more specific.

Indexing Pamela

Of Voyant's list of most frequently occuring words, I found the most interesting to be: said, good, sir, master, and poor. These words, in particular, have a clear implication of social status, highlighting intersections of gender and class. Pamela is a radical text that is inherently about class and economic mobility, but is often reduced to a conflict "between the sexes." Richardson seems to have very clear moral intent with his writing of Pamela, observable even on the title page, which states that the narrative "has its Foundation in TRUTH and NATURE." While Voyant's list of words do hint at the narrative's teachings on morality, I had anticipated more specificity, with the more frequent inclusion of terms such as virtue, pride, honesty, clothing, jewels, riches.

Armstrong's core argument in Desire and Domestic Fiction illustrates the domestic novel as both an agent and a product of widespread cultural changes. Gender and sexuality were represented as entirely removed from social, political, and economic spheres -- thus, Armstrong argues that female narration, perceived as without claim to political legitimacy, allowed for a different and new form of political critique.

Fielding's representations of the original narrative in Shamela critique those who praise Pamela as an educational and informative piece, rather than how he sees it: as a form of entertainment. The Voyant tools provide us with a new perspective on the argument, one rooted in the occupation of quantifiable space in print texts.