4 min read
From the Eighteenth-Century Collections Online database, I used Laurence Sterne's The life and opinions of Tristram Shandy, gentleman. ... London, MDCCLX. -67 [1771?]. 139 pp. Vol. Volume 1 of 9 (9 vols. available) Literature and Language. I attempted to do word searches specifically on the title and cover pages believing that the larger amount of spacing in those first pages would allow the site's keyword search to detect the words I was looking for easier. The first page composed almost entirely a photograph of the author and his name in large font (LAURENCE STERNE. A.M.). However, when searching for any part of his name, there were matches what so ever. I then turned to "Image 8," the title page. The page itself had the title (THE LIFE AND OPINIONS OF TRISTRAM SHANDY, GENT., followed by a brief description in the lower quadrant of the page that was not in English, and then the year and volume at the bottom. Nearly everything on that page, with exception to the character's name and the word "Opinions," was detected by the keyword finder. I assumed it was because of the cursive writing that overlayed over some of the title and, possibly, the penmanship and spacing on the part of the author and publisher. My assumption was later proven as I spaced out the word chapter ("CHAP" vs. "C H A P") and found that the keyword reader picked up the first spelling for some chapters and the other spelling for the majority of the rest. Finally, I attempted to match the large font writing of "Sir" on the page labeled "Image 10," but all I matched was blank areas on other pages and the Non-English words on the title page.
Using the keyword readers in both the free version of Adobe Acrobat and Google Chrome proved fruitless as I could not match visible words, letters, or numbers to the pages. When running the pages through Adobe Pro's Optical Character Recognition (OCR), I made sure to run three types of pages; one with solely text, one with text and handwritten notes, and one with handwritten exclusively notes. When running pages with text and written notes through OCR, this was my result:
I was overwhelmed with text boxes to the point where I was unable to edit words without having another textbox shroud my writing. For pages with solely handwritten notes, I found little to no text boxes. Even in the pages with exclusively published text, I noticed increased spacing in certain areas that were previously unreadable which were brought up as corrections by Adobe's spellchecking system. In trying to reduce the number of words continued onto another different line to increase the chances of detection in plain text and Microsoft Word files, I was unable to change the letter count in a line without having the word spill over to the line below it. The OCR also saw certain letters that overlapped each other as one single letter, indicated by removing both letters when I used the backspace key. Finally, aspects of the PDF pages of the novel, including chapter headings, enlarged first letters for each chapter, and even an entire page were merely left as images by the OCR. Not only were certain words unable to be traced in the text, but the handwritten notes, if translated by OCR, could provide a source of thought or critique concerning the ideas presented in the novel. Though Adobe Pro's spellcheck was successful in determining some common English words from the text, it might also help to run a program cross-referencing the text with common words during that period, as it is much easier than keeping a systematic database of penmanship for certain writers.
My concern towards the use of using digital facsimiles and keyword searching of copied texts focused on the negative impact on research. In cases where research is needed for a specific word, topic, or individual, the inability of such programs to discover words in documents could result in the researcher missing a crucial report concerning their research or needs to spend additional time reading to every source if they cannot parse through them. The impaired translations of facsimiles would also limit the sources a reader could cross-reference in a database, significantly impairing their understanding the effects of their research topic on individuals who may not have had access to printing or did not want to risk outwardly sharing their opinions. Therefore, the ability to translate image to text and using OCR in facsimiles needs to be improved, or researchers and students risk confusing the words in literary texts and overlooking critical pieces of history.
#Exercise4 #DigitalSurrogates #TextPreparation #Pemanship #Legibility #Spacing