3 min read
The previous computational exercise introduced several interesting questions about the evolution of literary works over time, especially with respect to various narrative forms and titles of works. I was interested in using the Early Novels Database source (focusing specifically on metrics: NarrativeForm and TitleNouns) to re-evaluate these previous hypotheses.
The visualizations that I was most interested in creating were: 1) narrative form with respect to date of publication and 2) nouns in titles and adjectives in titles with respect to date of publication. Google Fusion Tables, while able to categorize all of the works into discrete subcategories, was generally less useful for tracking patterns and changes over time.
An example of my attempt to visualize different narrative forms over time:
Because Fusion Tables' visualizations are optimized for categorical data, I shifted my approach and attempted to, instead, map the various PubLocations of the works. Despite having over 850 rows of metadata, the mapping tool created fewer than 10 pins on the physical map. My primary concern with this mode of visualization is that it lacks any indication of frequency. For example, the city of London has one pin on the map, but after a closer examination of the metadata in the database, over 740 works have PubLocation set to London. This means that our visualized map has eliminated the ability to understand which publication locations were predominant over others. To a viewer who is blind to the actual data, Dublin (in which 90 works were published) is equally significant as a publishing location as London (in which 700+ works were published). Further, the more minute bugs that exist in computational tools like Fusion Tables are still present, such as the mapping of "Oxford" and "Bath" to the US. I quickly modified these fields in my copy of the database to "Oxford, UK" and "Bath, UK," which corrected the problem in the physical map. Still, these imperfections in computational translations cannot always be detected qualitatively by users.
Next, in hopes of visualizing the data that I was originally interested in exploring, I created a word cloud of different nouns in the titles of works. I found that the most frequently occuring nouns were: volume, story, history, adventure, volume, letter, edition, life, series, and novel. Many of these words appeared to serve the functional purpose of providing additional detail on the form of the novel, rather than the content of the work itself. A quick review of the database seemed to confirm this hypothesis, with many titles of works including details such as "A Novel," "Year [Publication Date]," and "A Series of Letters." In order to get a better sense of the content that was produced during this period, I parsed through and cleaned my copy of the metadata in Fusion Tables, filtering out words that denote form. After doing this, the most frequently found nouns in titles were: world, love, manner, death, sea, war, friend, sex, spy.
This exercise has demonstrated that depending on the actual form of the values in the data set (e.g. sentence, category, boolean), different data visualization and analysis tools may be most appropriate and in some cases, may even conceal important details to the detriment of the end-user.