Deprecated: __autoload() is deprecated, use spl_autoload_register() instead in /home/rachelsa/rise18.rachelsagnerbuurma.org/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php on line 17

Warning: ini_set(): Headers already sent. You cannot change the session module's ini settings at this time in /home/rachelsa/rise18.rachelsagnerbuurma.org/Idno/Core/Session.php on line 22

Warning: ini_set(): Headers already sent. You cannot change the session module's ini settings at this time in /home/rachelsa/rise18.rachelsagnerbuurma.org/Idno/Core/Session.php on line 23

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/rise18.rachelsagnerbuurma.org/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/rise18.rachelsagnerbuurma.org/Idno/Core/Session.php on line 25

Warning: ini_set(): Headers already sent. You cannot change the session module's ini settings at this time in /home/rachelsa/rise18.rachelsagnerbuurma.org/Idno/Core/Session.php on line 26

Warning: ini_set(): Headers already sent. You cannot change the session module's ini settings at this time in /home/rachelsa/rise18.rachelsagnerbuurma.org/Idno/Core/Session.php on line 27

Warning: ini_set(): Headers already sent. You cannot change the session module's ini settings at this time in /home/rachelsa/rise18.rachelsagnerbuurma.org/Idno/Core/Session.php on line 28

Warning: ini_set(): Headers already sent. You cannot change the session module's ini settings at this time in /home/rachelsa/rise18.rachelsagnerbuurma.org/Idno/Core/Session.php on line 29

Warning: session_set_save_handler(): Cannot change save handler when headers already sent in /home/rachelsa/rise18.rachelsagnerbuurma.org/Idno/Data/AbstractSQL.php on line 86

Warning: session_name(): Cannot change session name when headers already sent in /home/rachelsa/rise18.rachelsagnerbuurma.org/Idno/Core/Session.php on line 45

Warning: session_start(): Cannot start session when headers already sent in /home/rachelsa/rise18.rachelsagnerbuurma.org/Idno/Core/Session.php on line 46

Warning: session_cache_limiter(): Cannot change cache limiter when headers already sent in /home/rachelsa/rise18.rachelsagnerbuurma.org/Idno/Core/Session.php on line 47

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/rise18.rachelsagnerbuurma.org/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/rise18.rachelsagnerbuurma.org/Idno/Common/Page.php on line 57

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/rise18.rachelsagnerbuurma.org/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/rise18.rachelsagnerbuurma.org/Idno/Common/Page.php on line 58

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/rise18.rachelsagnerbuurma.org/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/rise18.rachelsagnerbuurma.org/Idno/Core/Webmention.php on line 376

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/rise18.rachelsagnerbuurma.org/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/rise18.rachelsagnerbuurma.org/Idno/Core/Webmention.php on line 377

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/rise18.rachelsagnerbuurma.org/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/rise18.rachelsagnerbuurma.org/Idno/Core/PubSubHubbub.php on line 41

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/rise18.rachelsagnerbuurma.org/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/rise18.rachelsagnerbuurma.org/Idno/Core/PubSubHubbub.php on line 42

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/rise18.rachelsagnerbuurma.org/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/rise18.rachelsagnerbuurma.org/Idno/Pages/Entity/View.php on line 53

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/rise18.rachelsagnerbuurma.org/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/rise18.rachelsagnerbuurma.org/Idno/Pages/Entity/View.php on line 54

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/rise18.rachelsagnerbuurma.org/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/rise18.rachelsagnerbuurma.org/Idno/Pages/Entity/View.php on line 55

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/rise18.rachelsagnerbuurma.org/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/rise18.rachelsagnerbuurma.org/Idno/Pages/Entity/View.php on line 56

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/rise18.rachelsagnerbuurma.org/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/rise18.rachelsagnerbuurma.org/Idno/Pages/Entity/View.php on line 57

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/rise18.rachelsagnerbuurma.org/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/rise18.rachelsagnerbuurma.org/external/bonita/includes/Bonita/Templates.php on line 170

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/rise18.rachelsagnerbuurma.org/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/rise18.rachelsagnerbuurma.org/templates/default/shell.tpl.php on line 5

Warning: Cannot modify header information - headers already sent by (output started at /home/rachelsa/rise18.rachelsagnerbuurma.org/external/htmlpurifier-lite/library/HTMLPurifier.autoload.php:17) in /home/rachelsa/rise18.rachelsagnerbuurma.org/templates/default/shell.tpl.php on line 6
OCR-ing Tristram Shandy
Skip to main content

OCR-ing Tristram Shandy

3 min read

Cleaning the OCR'd text of my Tristram Shandy section wasn’t too difficult when I used find and replace. I learned that because databases like ECCO use OCR’d texts to power their search engine, searches are not going to be accurate if your word has an S in the middle or beginning of it. But you can try searching the archaic form of the word—so I’d have to try “worfhip” or “worlhip” (what my OCR’d text garbled it into) to find mentions of worship. If the OCR program analyzed the entire sentence rather than just single characters, machine learning could help predict what the right word should be – for example, it could make sure pronouns are spelled correctly because grammar dictates where they have to go; it could find words in the OCR’d text that aren’t in the dictionary and aren’t capitalized (so less likely to be pronouns); and it could use the placement of an “s” in a word to guess what word it should be.

 

Cleaned, OCR’d texts seem extremely helpful to me in increasing the accessibility of historically significant texts. Even digital facsimiles aren’t helpful to the extent that they take forever (and a lot of paper, if you print them) to be able to read. To me, divorcing a text from the layout that its original readers engaged with significantly changes it. I’m a visual learner, so I remember the location of a word on a page when I read, which leads me to think that the layout is important because it shifts how we group different parts of the text and how we digest what we’re reading—especially with newspapers and paratexts.

And that’s why it’s hard for us to contextualize the proliferation of the novel and how it affected people; we don’t experience the difficulties of obtaining novels anymore. I think it’s important to consider how books used to be read: slowly, carefully, while turning many expensive pages. Perhaps people memorized while reading so they could recount it orally later, or read out loud. Authors couldn't easily edit content, either, like we can edit a Word document. Reading a book on a webpage as we do with Gutenberg Project texts makes it seem like a sea of words that goes on continuously, and we can get lost in it.