Deprecated: __autoload() is deprecated, use spl_autoload_register() instead in /home/rachelsa/rise18.rachelsagnerbuurma.org/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php on line 17

Warning: ini_set(): Headers already sent. You cannot change the session module's ini settings at this time in /home/rachelsa/rise18.rachelsagnerbuurma.org/Idno/Core/Session.php on line 22

Warning: ini_set(): Headers already sent. You cannot change the session module's ini settings at this time in /home/rachelsa/rise18.rachelsagnerbuurma.org/Idno/Core/Session.php on line 23

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/rise18.rachelsagnerbuurma.org/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/rise18.rachelsagnerbuurma.org/Idno/Core/Session.php on line 25

Warning: ini_set(): Headers already sent. You cannot change the session module's ini settings at this time in /home/rachelsa/rise18.rachelsagnerbuurma.org/Idno/Core/Session.php on line 26

Warning: ini_set(): Headers already sent. You cannot change the session module's ini settings at this time in /home/rachelsa/rise18.rachelsagnerbuurma.org/Idno/Core/Session.php on line 27

Warning: ini_set(): Headers already sent. You cannot change the session module's ini settings at this time in /home/rachelsa/rise18.rachelsagnerbuurma.org/Idno/Core/Session.php on line 28

Warning: ini_set(): Headers already sent. You cannot change the session module's ini settings at this time in /home/rachelsa/rise18.rachelsagnerbuurma.org/Idno/Core/Session.php on line 29

Warning: session_set_save_handler(): Cannot change save handler when headers already sent in /home/rachelsa/rise18.rachelsagnerbuurma.org/Idno/Data/AbstractSQL.php on line 86

Warning: session_name(): Cannot change session name when headers already sent in /home/rachelsa/rise18.rachelsagnerbuurma.org/Idno/Core/Session.php on line 45

Warning: session_start(): Cannot start session when headers already sent in /home/rachelsa/rise18.rachelsagnerbuurma.org/Idno/Core/Session.php on line 46

Warning: session_cache_limiter(): Cannot change cache limiter when headers already sent in /home/rachelsa/rise18.rachelsagnerbuurma.org/Idno/Core/Session.php on line 47

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/rise18.rachelsagnerbuurma.org/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/rise18.rachelsagnerbuurma.org/Idno/Common/Page.php on line 57

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/rise18.rachelsagnerbuurma.org/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/rise18.rachelsagnerbuurma.org/Idno/Common/Page.php on line 58

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/rise18.rachelsagnerbuurma.org/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/rise18.rachelsagnerbuurma.org/Idno/Core/Webmention.php on line 376

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/rise18.rachelsagnerbuurma.org/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/rise18.rachelsagnerbuurma.org/Idno/Core/Webmention.php on line 377

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/rise18.rachelsagnerbuurma.org/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/rise18.rachelsagnerbuurma.org/Idno/Core/PubSubHubbub.php on line 41

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/rise18.rachelsagnerbuurma.org/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/rise18.rachelsagnerbuurma.org/Idno/Core/PubSubHubbub.php on line 42

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/rise18.rachelsagnerbuurma.org/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/rise18.rachelsagnerbuurma.org/Idno/Common/Page.php on line 57

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/rise18.rachelsagnerbuurma.org/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/rise18.rachelsagnerbuurma.org/Idno/Common/Page.php on line 58

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/rise18.rachelsagnerbuurma.org/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/rise18.rachelsagnerbuurma.org/Idno/Common/Page.php on line 998

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/rise18.rachelsagnerbuurma.org/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/rise18.rachelsagnerbuurma.org/external/bonita/includes/Bonita/Templates.php on line 170

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/rise18.rachelsagnerbuurma.org/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/rise18.rachelsagnerbuurma.org/templates/default/shell.tpl.php on line 5

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/rise18.rachelsagnerbuurma.org/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/rise18.rachelsagnerbuurma.org/templates/default/shell.tpl.php on line 6
Rise of the Novel 2018
Skip to main content

Metadata!

3 min read

While I was able to learn about the number of books published in different years and to ascertain a little bit about the prevalence of different narrative forms in the 18th century, I felt limited in what I could do with this dataset using Fusion Tables. First of all, I found it very difficult to draw conclusions about changes over time because, as the writeup mentions, you can only see the count of books that meet a certain criteria rather than the percentage of books published at a given time that meet that criteria. So when I went to see how the use of epigraphs trended over time, I couldn’t discount the possibility that the trends I saw were simply because the overall count of novels increased (especially considering that the epigraph trend very nearly mimicked the count trend). I think Voyant is much more useful for those kinds of inquiries.

 

 I also think that it would have been interesting to see the trend in authors’ use of prefaces — which we would expect to decrease as the public grew more comfortable with fictionality— but the “paratext titles” were all grouped together for each work, so you couldn’t separate the table of contents from the preface. This was the same for title adjectives — I found the category very interesting, but couldn’t find out much about them because while it’s likely that two titles will share one adjective, It’s unlikely they’ll share all 4 or 5 of their title adjectives. However, using a network graph, I got somewhat of a sense of the usage for each narrative form. If the data was separated for each title adj., say in a set that compared only the title adjective with the narrative form of the work, I think this tool would be very interesting. You could see which terms were shared by epistolary works and other first-person works as well as which ones were not shared, etc. The map was also interesting because I did not know that so many works were published outside of London.

 

 

I thought the word cloud that I made was the most interesting finding using this dataset. I chose to look at title adjectives, which I think is a great parameter for a dataset of novels. Again, I am missing Voyant’s capability to clean words from the word cloud, but I thought it was interesting how much the titles attempted to sell the novel, and in what way. Apparently, making sure the reader knows your protagonist is young and female was important; the word "male" did not show up as often and "old" certainly didn't show up. I also think it's funny that "moral" and "historical" competed with "interesting" and "entertaining" (entertaining is the winner, in terms of usage). The word "secret" was also popular -- here we see the interest in the private workings of the young female mind, I guess. I would definitely like to see historical trends on which of these words continued to be used; I could easily use Voyant to do that!

 

Metadata and the early novel

3 min read

This dataset was one of the most satisfying and informative ones I’ve worked with so far, which I think is attributable to how much detailed work went into the metadata before I processed it—the amount of nuance (particularly seen in the narrative forms, translation claims, and the presences of inscription/marginalia which the data accounts for) in this record of 855 novels is astounding! As such, I think I might value this dataset more because it’s human-made; maybe several advanced computer programs could detect handwritten additions to novels, make some assessment of form, or account for separations between a given text and an untranslated/unabridged/original version, but I’m instinctively more trusting of the work that people have done in order to create that metadata (maybe that makes me unreasonably biased against AIs, in which case I advance my apology to our future robot overlords, ).

As others have mentioned, both the publishing locations and title places in this dataset were overwhelmingly concentrated around the southern UK (i.e. around London) with some title places appearing in North America as well. I was initially surprised to not see more title places cropping up in South America, Asia, or Africa, considering the U.K.’s colonial interests, but I think that the period of these novels’ publication dates (1700 to 1779) places them before the true heyday of British empire; furthermore, considering the difficulties we had earlier in the semester with non-standardized spellings in Robinson Crusoe, it wouldn’t surprise me if a fair amount of title references weren’t able to be mapped simply because of Google Maps’ more contemporary location references. I’d be interested to look at the correlations between publishing date and places referenced, or genre and the presence of marginalia/inscriptions (I was able to see that roughly half of the works published included some kind of handwritten paratext, but want to look into more detail at what kind of texts get written on more often than others), but wasn’t quite sure how to wrangle Fusion into contrasting those categories—in general, what most struck me about the experience of using Fusion and the word cloud program was how little I understood about the ways these programs function. My STEM education in general has been patchy and my comp-sci knowledge is practically nonexistent, so features like Fusion’s network graphs (with their weird wobbling clumps of data) and the word cloud’s “spiral” and “scale” settings were intriguing but difficult to fathom, even after reading the background information which both programs provide (although I did learn that when the word cloud creator mentioned using “sprites,” he was referring to a type of two-dimensional graphic and not the programming pixies I imagined, more’s the pity). I really enjoyed looking into this collection of metadata and trying to think more deeply about the best ways to interpret such a large compilation of previous interpretations, but I think I need to gain a better understanding of how exactly both complex programs like Fusion and simple ones like the word cloud creator perform their interpretations.

Exercise 6

2 min read

The first odd thing I noticed while creating my visualizations of the data was that, in addition to a general upward trend in novels published annually, the publication date bar chart showed a handful of spikes in certain years (1700, 1718, 1741, etc.). What could account for this? Why, for example, do the data include a single novel published in 1740 and 15 published the following year? Are the data unrepresentative, or was there some external cause for relatively high volumes of novels to be published in certain years?

I also found the narrative form pie chart interesting, but what I really wanted to see was any observable connection between narrative form and publication date. What narrative trends might we see developing over the course of the eighteenth century? Unfortunately, I found the network graph linking publication date and narrative form to be pretty difficult to parse. It was a confusing jumble of overlapping nodes, linkages, and labels; it would probably be easier to glean something significant from a simple list of the novels' narrative forms, ordered by publication date. Google might invest in making this particular style of visualization easier to read.

I generated a word cloud of the nouns in the novels' titles and was struck by how many of the most commonly appearing words were indicators of title or rank: "Mr," "Mrs," "Gentleman," "Lady," "Esq," "Prince." I'd be interested to see how frequently these kinds of words appear in the titles of nineteenth-century British novels--or in those of eighteenth-century novels published elsewhere. Was colonial American fiction, for instance, as evidently preoccupied with social status?

Exercise 6:

When looking at the map of the Title Places, I’m not surprised by the large concentration of novels that reference Europe or North America in their titles. Of the non-Western location on the map, most of them are in Asia, as opposed to Africa. This is in line with my historical understanding of Europe at the time; Asia was at the forefront of European consciousness in a way that Africa would not be till the late 1800s. Simply, there was a lot more contact between Asia and Europe at the time than between Africa and Europe. What I found surprising about the map was the lack of titles referencing places in South America. The only dot in all of South America is “Peruvian Tales…” which is a translation anyways (not an original prose narrative from the West). I don’t really have theories on why this is, because Europe is still a huge presence in South America during the late 1700s.

When I looked at the graph of the narrative forms filter with publication date and count, I found interesting results with regards to first person and third person narratives. We have predictable results for epistolary novels and novels in general, but first person and third person both seem to be oscillating on top of each other from the 1740s to the 1780s. I couldn’t figure out how to overlay the graphs on top of each other in order to really compare the two, but I looked at a few outliers in order to gain some data points to analyze. 1751 is the biggest year for first person narratives, and is an incredibly small year for third person narratives. Conversely, 1753 is a large year for third person narratives, and a 50th percentile type year for first person narratives. Given that I trust the data, I’d believe that perhaps because of the small number of writers and novels generally, the oscillating frequencies indicate a really fickle and transitionary period in the novel (makes sense), in which there is a style in the moment for certain years. It’s more likely however, that our small sample size (21% of data) is responsible for the large oscillations, and with more data, the trends would smooth a bit better.

I think this is one of the coolest sets of data we’ve looked at so far and there’s a lot we can do with it. Its not unfeasible to expand and enhance the data set, with every two or three novels we add to the set, we gain 1% of the total data! The main thing I would do would be to convert a lot of the non-numerical data to numerical data. For example, in narrative form, there are only 7 or 8 different entries. I would make each of those entries its own column, and make them binaries. I would also turn the volume column into a purely numerical variable (1 would be 1 volume, 2 would signify 2 and so on. I’m sure I could turn stuff like Pub location into binaries as well. Significantly, this would allow me to run a lot of regression analysis on the data with very little computation difficulty. For example, I could try and predict publication year based on all the other numerical variables, and see if I can find correlations. I could also try and predict the types of paratext in the novel based on the other features. Given more time and effort, I would use more complex machine learning techniques to interpret the textual data, but I think there’s a lot of potential in this dataset for simple numerical analysis that could lead to significant results, without too much effort.

Metadata for Evelina and Friends

3 min read

The previous computational exercise introduced several interesting questions about the evolution of literary works over time, especially with respect to various narrative forms and titles of works. I was interested in using the Early Novels Database source (focusing specifically on metrics: NarrativeForm and TitleNouns) to re-evaluate these previous hypotheses.

The visualizations that I was most interested in creating were: 1) narrative form with respect to date of publication and 2) nouns in titles and adjectives in titles with respect to date of publication. Google Fusion Tables, while able to categorize all of the works into discrete subcategories, was generally less useful for tracking patterns and changes over time.

An example of my attempt to visualize different narrative forms over time:

Because Fusion Tables' visualizations are optimized for categorical data, I shifted my approach and attempted to, instead, map the various PubLocations of the works. Despite having over 850 rows of metadata, the mapping tool created fewer than 10 pins on the physical map. My primary concern with this mode of visualization is that it lacks any indication of frequency. For example, the city of London has one pin on the map, but after a closer examination of the metadata in the database, over 740 works have PubLocation set to London. This means that our visualized map has eliminated the ability to understand which publication locations were predominant over others. To a viewer who is blind to the actual data, Dublin (in which 90 works were published) is equally significant as a publishing location as London (in which 700+ works were published). Further, the more minute bugs that exist in computational tools like Fusion Tables are still present, such as the mapping of "Oxford" and "Bath" to the US. I quickly modified these fields in my copy of the database to "Oxford, UK" and "Bath, UK," which corrected the problem in the physical map. Still, these imperfections in computational translations cannot always be detected qualitatively by users.

Next, in hopes of visualizing the data that I was originally interested in exploring, I created a word cloud of different nouns in the titles of works. I found that the most frequently occuring nouns were: volume, story, history, adventure, volume, letter, edition, life, series, and novel. Many of these words appeared to serve the functional purpose of providing additional detail on the form of the novel, rather than the content of the work itself. A quick review of the database seemed to confirm this hypothesis, with many titles of works including details such as "A Novel," "Year [Publication Date]," and "A Series of Letters." In order to get a better sense of the content that was produced during this period, I parsed through and cleaned my copy of the metadata in Fusion Tables, filtering out words that denote form. After doing this, the most frequently found nouns in titles were: world, love, manner, death, sea, war, friend, sex, spy.

This exercise has demonstrated that depending on the actual form of the values in the data set (e.g. sentence, category, boolean), different data visualization and analysis tools may be most appropriate and in some cases, may even conceal important details to the detriment of the end-user.