emop Workflow Design Description This section describes the current OCR process workflow at TAMU based on the work 1

Similar documents
HOW TO WRITE AN NDES POLICY MODULE

New York Conference Church Dashboard User Guide

Gateway Developer Guide

RootsWizard User Guide Version 6.3.0

Whatever happened to cman?

It is One Tailed F-test since the variance of treatment is expected to be large if the null hypothesis is rejected.

Application for curing ailments through mudra science

Agency Info The Administrator is asked to complete and keep current the agency information including web site and agency contact address.

Cataloging for the Preaching and Worship Portal Harry Plantinga April 10, 2014


Online Mission Office Database Software

SPIRARE 3 Installation Guide

Welcome to Breeze Fairview Baptist s Church Management Software

Summary of Registration Changes

TÜ Information Retrieval

Torah Code Cluster Probabilities

Grids: Why, How, and What Next

COS 226 Algorithms and Data Structures Fall Midterm

Balancing Authority Ace Limit (BAAL) Proof-of-Concept BAAL Field Trial

TECHNICAL WORKING PARTY ON AUTOMATION AND COMPUTER PROGRAMS. Twenty-Fifth Session Sibiu, Romania, September 3 to 6, 2007

Digital Logic Lecture 5 Boolean Algebra and Logic Gates Part I

Why use perfect money and what are its benefits?

Carolina Bachenheimer-Schaefer, Thorsten Reibel, Jürgen Schilder & Ilija Zivadinovic Global Application and Solution Team

OJS at BYU. BYU ScholarsArchive. Brigham Young University. C. Jeffrey Belliston All Faculty Publications

The Light Wizzard Content Management System (CMS)

Quorums. Christian Plattner, Gustavo Alonso Exercises for Verteilte Systeme WS05/06 Swiss Federal Institute of Technology (ETH), Zürich

BE5502 Course Syllabus

Wesley Theological Seminary Course of Study School Weekend Winter- Hybrid 2016

Instructions for Ward Clerks Provo Utah YSA 9 th Stake

Six Sigma Prof. Dr. T. P. Bagchi Department of Management Indian Institute of Technology, Kharagpur. Lecture No. # 18 Acceptance Sampling

PRAYER TEAM with Church Online

OPENRULES. Tutorial. Determine Patient Therapy. Decision Model. Open Source Business Decision Management System. Release 6.0

invenio-search-ui Documentation

MH Campus: Institution Pairing

E32-DE-IDM-32. Opto-Isolated I/O Board

Verification of Occurrence of Arabic Word in Quran

Identifying Anaphoric and Non- Anaphoric Noun Phrases to Improve Coreference Resolution

APRIL 2017 KNX DALI-Gateways DG/S x BU EPBP GPG Building Automation. Thorsten Reibel, Training & Qualification

Information Extraction. CS6200 Information Retrieval (and a sort of advertisement for NLP in the spring)

KEEP THIS COPY FOR REPRODUCTION Pý:RPCS.15i )OCUMENTATION PAGE 0 ''.1-AC7..<Z C. in;2re PORT DATE JPOTTYPE AND DATES COVERID

NPTEL NPTEL ONINE CERTIFICATION COURSE. Introduction to Machine Learning. Lecture-59 Ensemble Methods- Bagging,Committee Machines and Stacking

Identity and Curriculum in Catholic Education

Distributed Systems. 11. Consensus: Paxos. Paul Krzyzanowski. Rutgers University. Fall 2015

Introduction to Statistical Hypothesis Testing Prof. Arun K Tangirala Department of Chemical Engineering Indian Institute of Technology, Madras

An Efficient Indexing Approach to Find Quranic Symbols in Large Texts

Gesture recognition with Kinect. Joakim Larsson

Data Sharing and Synchronization using Dropbox

The Urantia Book Search Engine

A Knowledge-based System for Extracting Combined and Individual Quranic Recitations

How to secure the keyboard chain

MOVING TO A UNICODE-BASED LIBRARY SYSTEM: THE YESHIVA UNIVERSITY LIBRARY EXPERIENCE

Summary of Research about Denominational Structure in the North American Division of the Seventh-day Adventist Church

DALI Expander. Datasheet. Expander with Power Supply. device for simple DALI circuit expansion with integrated Bus power supply

Mark V. Shaney. Comp 140

The World Wide Web and the U.S. Political News Market: Online Appendices

From Machines To The First Person

THE PROFIT EFFICIENCY: EVIDENCE FROM ISLAMIC BANKS IN INDONESIA

Investigating I/O approaches to improve performance and scalability of the Ocean-Land-Atmosphere Model

Transcription ICANN London IDN Variants Saturday 21 June 2014

Smith Waterman Algorithm - Performance Analysis

Artificial Intelligence. Clause Form and The Resolution Rule. Prof. Deepak Khemani. Department of Computer Science and Engineering

Argument Harvesting Using Chatbots

The Gaia Archive. A. Mora, J. Gonzalez-Núñez, J. Salgado, R. Gutiérrez-Sánchez, J.C. Segovia, J. Duran ESA-ESAC Gaia SOC and ESDC

P2P Content Distribution BitTorrent and Spotify

COURSE SYLLABUS - ST5534 Systematic Christian Theology 1

Bank Chains Process in SAP

SQL: An Implementation of the Relational Algebra

Reference Texts: Paul Scott Wilson, Editor. The New Interpreter s Handbook of Preaching

Bigdata High Availability Quorum Design

Question Answering. CS486 / 686 University of Waterloo Lecture 23: April 1 st, CS486/686 Slides (c) 2014 P. Poupart 1

Congregational Survey Results 2016

DALI power line communication

Artificial Intelligence: Valid Arguments and Proof Systems. Prof. Deepak Khemani. Department of Computer Science and Engineering

Course Syllabus School of Professional Studies PHL/352 Christian Apologetics Online Summer 2012 (3 Units)

Health Information Exchange

Quorum Website Terms of Use

Laboratory Exercise Saratoga Springs Temple Site Locator

TRAMPR: A package for analysis of Terminal-Restriction Fragment Length Polymorphism (TRFLP) data

Bibliometric indicators for statisticians: critical assessment in the italian context

Biometrics Prof. Phalguni Gupta Department of Computer Science and Engineering Indian Institute of Technology, Kanpur. Lecture No.

Ambassador College and Recent Calendar History

University of New Hampshire Spring Semester 2016 Philosophy : Ethics (Writing Intensive) Prof. Ruth Sample SYLLABUS

Overview of the ATLAS Fast Tracker (FTK) (daughter of the very successful CDF SVT) July 24, 2008 M. Shochet 1

MusicKit on the Web #WWDC18. Betim Deva, Engineering Manager, Apple Music DJ Davis, Engineering Manager, Apple Music Jae Hess, Engineer, Apple Music

Tuen Mun Ling Liang Church

Performance Analysis with Vampir

This report is organized in four sections. The first section discusses the sample design. The next

How to Read Holy Bible on GNU / Linux with Xiphos

McDougal Littell High School Math Program. correlated to. Oregon Mathematics Grade-Level Standards

HP Serviceguard Quorum Server Version A Release Notes

Your instructor is available for correspondence. If you have a question about the course, you can contact your instructor via .

New FamilySearch in the Trenches: Thoughts About Best practices Laurie Castillo Aug 2011

Intel x86 Jump Instructions. Part 5. JMP address. Operations: Program Flow Control. Operations: Program Flow Control.

upcoming tutorials All sessions to be held in 138 Cargill register at msi.umn.edu

APAS assistant flexible production assistant

Allreduce for Parallel Learning. John Langford, Microsoft Resarch, NYC

Reference Resolution. Regina Barzilay. February 23, 2004

Pairing Student Canvas Accounts with ALEKS Through MH Campus

Reference Resolution. Announcements. Last Time. 3/3 first part of the projects Example topics

For Parishes and Missions

Transcription:

emop Workflow Design Description 1. emop Workflow This section describes the current OCR process workflow at TAMU based on the work 1 completed for the Early Modern OCR Project (emop). As seen in Figure 1, the emop workflow is a simple pipeline of various software components, which turns page images into their text and XML equivalents. The workflow is embodied in the emop controller, which is a Python framework used to interact with the emop DB, a Network Access Storage (NAS) system, and software components written in several different languages. 1 http://emop.tamu.edu/

Figure 1 : The emop - controller, manageroftheemopworkflow

1.1. emop Controller As pictured in Figure 1, the emop controller works as follows: 1 An authorized user uses the emop Dashboard web application to select documents from the collection to be OCRd. That opens a dialogue box which allows the user to select the OCR engine and, where applicable, which training set to use, while OCRing. The Dashboard then marks the associated pages for the selected documents as Not Started in the emop DB s JOB_QUEUE table. 2 The Dashboard also servers as the point of contact between the emop controller and the emop DB via an API. The emop controller queries the Dashboard for information pertaining to all the selected documents pages (image file location, groundtruth info, current job status). The Dashboard returns information from the emop DB in the form of a JSON response, which the emop controller writes as a set of input files to a temporary location on the NAS. The scheduler splits pages into jobs of an equal number of pages for each available processor (128 dedicated processors for the IDHMC queue and a variable number of processors for the background queue) on the Brazos High Performance Computing Cluster (HPCC or Brazos). These jobs are then assigned to a processor queue for processing, where the emop controller is called for each page. Finally, all assigned pages have their status updated to Processing in a JSON formatted response file to be written to the emop DB when all pages assigned to a processor are finished. For emop, parallelization was done on a page level basis. Each available processor was utilized to run the emop controller on a single page at a time to completion (when possible) on the 128 processors available to us at all times on the IDHMC queue, and many more on unused processors of background queues. 3 The TIF page images are OCRd with Tesseract using the training specified by the user in the Dashboard at job submission. Text and hocr files are produced and saved on the NAS in with a path like <emophome>/<batch#>/<emop#>/<files>. hocr is Tesseract s proprietary XML like output. It is actually HTML with extra attributes added as semi colon separated values in the @title attribute of associated block tags. Tesseract partitions the hocr output

into nested page, area, paragraph, line and work blocks. Each block 2 contains bounding box (bbox) coordinates in it s @title attribute. The job status for each page is then updated to Pending Postprocessing in the JSON response file. Tesseract is written in C++. The current released version of Tesseract is 3.02. emop is using version 3.03 in order to take advantage of that version s ability to add confidence scores to each word in the hocr output. emop s hocr files are renamed to have an.xml extension. For emop all training includes a unicharambigs file which is used to convert special characters (ƒ, ) and ligatures (ſt, œ) to their modern or multi character equivalents (s, r, st, & oe respectively). We made this decision to improve searchability of the texts. 4 The emop de noising algorithm analyzes the hocr output in order to attempt to remove noise words page noise and images that Tesseract identifies as words. The algorithm looks at bbox coordinates to identify words whose position and size indicate they are not part of the page s text block. The de noising algorithm is run on every page that is OCRd and takes the page s hocr files as input. It produces one new file and updates another. The new output file is an xml file with a _IDHMC suffix added to the page number used as the filename. Both of these files have had all noise words removed. The updated file is the original input hocr file (with an.xml extension) with additional values added to the @title string: pred: a value of 0 or 1 indicating that the word is likely valid or noise, respectively based on the noiseconf value and a defualut cuttoff of 50%. noiseconf: measure of confidence of noise, the current default causes any word with a noiseconf value greater than 50% to be 3 removed from the *_IDHMC produced files. The new file is written to the NAS in the same folder with the Tesseract output for that document. The pages job status is updated to Postprocessing in the JSON response file. The de noising algorithm also produces an overall measure of noise for the page, which is written to the JSON response file. The de noising algorithm, created by the PSI Lab at TAMU, is written in Python and requires the beautifulsoup, numpy, and scipy modules. 2 See Appendix A.1 for a sample hocr file. 3 See Appendix A.2 for a sample de noised hocr file.

5 The multiple column and skew detection ( MultiColSkew ) algorithm utilizes the pages bounding boxes to analyze its geometry and identify when multiple columns are present in a page image, and their locations. It also identifies and measures the amount of skew present. These values are then written to each pages JSON response file. MultiColSkew, created by the PSI lab, is written in Python and requires the numpy module. 6 A final algorithm creates a new text version of the output with the noise words removed i.e. based on the newly created *_IDHMC.txt files. That file is also appended with an _IDHMC suffix. The new file is written to the NAS in the same folder with the Tesseract output for that document. 7 The page evaluator is the first step in the page correction process. It evaluates the text produced by Tesseract (after de noising) for each page to determine whether it fits the profile of expected from a normal page of text. After some cleaning of the text (removing leading punctuation) It looks at things like the number of words (tokens) on the page, the average length of those words, the occurrence of a continuous string of repeated characters, the length of each word compared to the page average, the interspersion of alphabetic and numeric characters, and punctuation in a word, and how many words can be found in a dictionary. The page evaluator creates a score for Estimated Correctability (ECORR) and Estimated Page Quality (the ECORR divided by the number of words on the page). These values are then added to each pages JSON response. The page evaluator, created by SEASR at the University of Illinois, was written in scala and then converted to java for the the emop controller. 8 Pages are then passed to the page corrector to undergo correction based on early modern (EM) dictionaries and an DB of google 3 grams collected from EM documents. The dictionaries include alternate and abbreviated spellings with special characters and ligatures converted to modern, multi character equivalents. There are multiple English language dictionaries and a French and Latin dictionary as well. The page corrector takes as input a de noised hocr file containing the de noising confidence measures. In short, the page corrector: 1. Starts with the first three words on the page,

2. looks up each word in the dictionaries for a match, 3. makes character substitutions for each word looking for other possible dictionary matches, then 4. uses all possible matches for each word to look for matches in the google 3 gram DB. Matching 3 grams are weighted based on the number of uses in the original texts. Words that matched in the dictionary without substitution are given more weight. All of this is used to determine the correct matching 3 gram. 5. The corrector then gets the next 3 word window, consisting of two words from the previous window and the next word in reading order. 6. Repeat from 2 till done. When the page corrector is complete for each page, it creates an ALTO 4 XML and a text file containing all corrections. The ALTO XML file also contains word confidence measures in a @WC attribute. Any word that is changed by the corrector contains one or more <ALTERNATIVE> sub tags, the last of which is the original version of the word from the hocr input. The page corrector, created by SEASR at the University of Illinois, was written in scala and then converted to java for the the emop controller. 9 Pages that have groundtruth equivalents available are then scored using Juxta CL, using one of three character distance measurement algorithms (Levenshtein, Jaro Winkler, and Juxta). The juxta score is then written to the JSON response file for each page. Each page s job status is updated to Done. The JSON input file from 2 above contains a flag about whether groundtruth is available for each page as well the file path information for any groundtruth files. Juxta CL, created by Performant Software and based on JuxtaCommons, is written in java. When a processor is finished with every page in it s job queue, the Dashboard writes all associated JSON response files to the emop DB. Document and page level results are viewable via the Dashboard. 1.2. Inputs/Outputs At the lowest level of abstraction, the input for the overall emop workflow is page images. In the case of emop all of our input documents were low quality, small 4 See Appendix A.3 for a sample ALTO XML file.

(avg ~ 40KB for ECCO and ~140KB for EEBO) TIF files. Every document was broken up into individual page TIFs (one page per image for ECCO and 2 pages per image for EEBO) by the provider. Tesseract is capable of handling a single TIF document with multiple pages (there are a handful of those in our collection as well). A config.ini file residing in the emop home directory on Brazos is used to control the workflow by passing parameters to each component of the workflow. $EMOP_HOME=/home/mchristy/emop/emop-controller-test emop Dashboard [git] Input : User selection of documents to be OCRd and training to be used. Output : A set of temporary JSON files containing information obtained from a query of the emop DB for each page associated with the user selected documents. Location : $EMOP_HOME/payload/input Configuration : [dashboard] api_version=1 url_base=http://emop-dashboard.tamu.edu Requirements : Ruby on Rails, Juxta web service cluster scheduler [git] Input : JobID(s) Output : None Configuration : [scheduler] max_jobs=128 queue=idhmc name=emop-controller min_job_runtime=300 max_job_runtime=259200 avg_page_runtime=480 logdir=/fdata/scratch/mchristy/emop-controller/logs mem_per_cpu=4000 cpus_per_task=1 set_walltime=false extra_args=["--constraint","core32"] Requirements : Slurm (other schedulers are possible) Dashboard interaction : None. emop controller [git] ] Input : TIF pages images.

Output : An hocr (renamed with.xml extension) and a text file for each page, written to the IDHMC NAS. The filename is the page number. Location : /data/shared/text-xml/idhmc-ocr/<batch#>/<emop#>/ Configuration : [controller] payload_input_path= /fdata/scratch/mchristy/emop-controller/payload/input payload_output_path= /fdata/scratch/mchristy/emop-controller/payload/output ocr_root=/data/shared/text-xml/idhmc-ocr input_path_prefix=/dh output_path_prefix=/dh log_level=info scheduler=slurm skip_existing=true Requirements : All of the following code packages. Dashboard interaction : File paths of output files for each page, written to ocr_text_path and ocr_xml_path fields of page_results table. Job status for each page, written to job_status field of job_queue table. Tesseract (v3.03) [git] Input : Page images, training, DAWG files (dictionaries), unicharabmigs. Output : hocr / text files. Configuration : None Requirements : Leptonica Dashboard interaction : None De nosing [git] Input : hocr files (<page#>.xml) with word confidence levels included in the x_wconf field of the @title attribute. Output : The original hocr file (<page#>.xml) is updated to include a noiseconf measure for each word, and a pred field to indicate that the word falls above (1) or below (0) the default of 50%. A new XML file (<page#>_idhmc.xml) is created that has all words with a pred value of 1 removed. A new text file (<page#>_idhmc.txt is created from the associated xml file. Location : /data/shared/text-xml/idhmc-ocr/<batch#>/<emop#>/ Configuration : None Requirements : beautifulsoup4, numpy, scipy Dashboard interaction : An overall noise measure for the page, written to noisiness_idx field of page_results table. MultiColumnSkew [git] Input : De noised <page#>_idhmc.xml page file. Output : None

Configuration : [multi-column-skew] enabled=true Requirements : numpy Dashboard interaction : Column coordinates and skew measure, written to multicolpoints and skew_idx fields of postproc_pages table. page evaluator [git] Input : The original hocr file (<page#>.xml) with updated noiseconf and pred fields added by de noising algorithm. Output : None. Configuration : [page-evaluator] java_args=["-xms128m","-xmx128m"] Requirements : Dashboard interaction : Estimated correctability and page quality scores, written to the pp_ecorr and pp_pg_quality fields of postproc_pages table. page corrector [git] Input : The original hocr file (<page#>.xml) with updated noiseconf and pred fields added by de noising algorithm. Output : Creates two new files on the NAS: <page#>_alto.xml and <page#>_alto.txt. All words identified by the de noiser with pred=0 are removed. The noiseconf value from the input XML is transferred to the ALTO XML output as the @emop:dnc attribute. A new @WC attribute is added to record the page corrector s confidence that its contents are correct (0 100 value). One or more <ALTERNATIVE> sub tags are included for every word which is corrected. The last <ALTERNATIVE> tag is the original word form from the input XML file. If save = True in the config.ini, then a statistics file is created for each page and save to the NAS. Location : /data/shared/text-xml/idhmc-ocr/<batch#>/<emop#>/ Configuration : [page-corrector] java_args=["-xms2048m","-xmx2048m"] alt_arg=2 max_transforms=20 noise_cutoff=0.5 ctx_min_match= ctx_min_vol= dump=false save=false timeout=300 Requirements :

Dashboard interaction : File paths of output files for each page, written to corr_ocr_text_path and corr_ocr_xml_path fields of page_results table. In addition, a string of corrector statistics is written to the pp_health field of the postproc_pages table. The string is a ; separated list containing numbers for: total words ignored words (no attempt to process) correct words (processed and determined to be correct) corrected words unchanged words (processed and determined to be incorrect, but no correction available) juxta cl [git] Input : Corrected text file (<page#>_alto.txt) and its associated groundtruth page file (the availability of, and a file path to, any groundtruth files are contained in the JSON file created by the emop Dashboard and stored in $EMOP_HOME/payload/input. Output : None Configuration : [juxta-cl] jx_algorithm=levenshtein Requirements : Dashboard interaction : The character level distance between the two pages is stored as a score between 0 & 1, written to the juxta_change_index field of the page_results table. 1.3. System Configuration 1.3.1. Brazos Cluster As a stakeholder in the Brazos cluster, the IDHMC has full time, uninterruptable access to 128 processors via the idhmc queue. We also have access to unused processors as they are available, via the background queue. However, background queue jobs can be interrupted at any time by higher priority queues. The Brazos login server includes access to the IDHMC NAS for IO file storage. 1.3.1.1. Configuration Files Upon logging in to the Brazos login server, I cd into the emop controller directory and then load all required modules along with the emop module, which loads all software needed by the

emop controller. In this same directory are several configuration files: config.ini: contains parameters to control the flow of the emop controller and it s various components. emop.properties: contains location and login info for the google 3 gram DB. emop.slrm: is used by the scheduler to create job queues and call the emop controller. 1.3.2. emop DB The emop DB is quite large and resides on a dedicated database server accessible via the Brazos cluster. To minimize the potential for DB access to become a bottleneck in the workflow, database reading and writing is handle by the emop Dashboard for blocks of pages. Upon submission of a batch, by a Dashboard user, the emop DB is queried for all relevant data on the submitted pages. The batch is then split up into several queues to be assigned to available processors. While a job is processing, all output data is written to a JSON file. When a job has completed, it corresponding JSON files are sent back to the Dashboard where they are processed and written to the emop DB in a block. All interaction with the emop DB is via the emop Dashboard through 4 available subcommands: query: Reads from dashboard informational only does not impact OCR workflow. submit: Reserves pages from dashboard (read+write). The pages returned by dashboard are modified to reflect they have been reserved for processing on cluster. This subcommand writes the returned data to JSON file and submits that as a job to the cluster scheduler. run: Runs the emop workflow on a compute node using JSON data as input and writing JSON data as output. The individual pieces (Tesseract, denoise, etc) also write their own files. upload: Typically executed after the "run" subcommand completes. This sends the job's JSON data back to dashboard to update dashboard on final status of OCR page(s). Data can aso be uploaded on demand via this subcommand. 1.3.3. NAS The IDHMC NAS is accessible from several of the IDHMC s servers as well as from the Brazos cluster. It serves as the IDHMC s and emop s

primary network storage device. It contains 42TB of disk space, about 25TB of which are currently (11/9/15) free. The NAS contains all of emop s page images, groundtruth files, and the result files of the entire emop workflow. 1.4. Github All of the above described code is available open source via an Apache v2.0 licence at the emop Github page: https://github.com/early Modern OCR. 1.5. Dashboard 1.5.1. API The emop Dashboard also has an admin interface that provides an API into the emop DB via Ruby on Rails: http://emop dashboard.tamu.edu/admin/dashboard. Documentation for using the API is availble at http://emop dashboard.tamu.edu/apidoc. This is available to authenticated users only.

Appendix A A.1 Original hocr Output (emop work_id 32, page1) <? xmlversion = "1.0" encoding = "UTF-8"?> <!DOCTYPEhtmlPUBLIC"-//W3C//DTDXHTML1.0Transitional//EN" "http://www.w3.org/tr/xhtml1/dtd/xhtml1-transitional.dtd"> <html lang = "en" xml:lang = "en" xmlns = "http://www.w3.org/1999/xhtml"> <head> <title> </title> <meta content = "text/html;charset=utf-8" http-equiv = "Content-Type" /> <meta content = "tesseract3.03" name = "ocr-system" /> <meta content = "ocr_pageocr_careaocr_parocr_lineocrx_word" name = "ocr-capabilities" /> </head> <body> <divclass="ocr_page"id="page_1" title = 'image"/dh/data/eebo/e0031/40133/00001.000.001.tif";bbox0021791842;ppageno 0;noisiness0.1386'> <div class = "ocr_carea" id = "block_1_1" title = "bbox121102179310"> <p class = "ocr_par" dir = "ltr" id = "par_1" title = "bbox1211222179310"> <spanclass="ocr_line"id="line_1" title = "bbox1560222179127;baseline-0.0016155089-33" >< span class = "ocrx_word" dir = "ltr" id = "word_1" lang = "SC8b-R8-D2b" title = "bbox156024162594;x_wconf79" > A </ span> <spanclass="ocr_line"id="line_2" title = "bbox12111321947215;baseline0.014945652-10" >< span class = "ocrx_word" dir = "ltr" id = "word_3" lang = "SC8b-R8-D2b" title = "bbox12111321471209;x_wconf73" >< em > Serious < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_4"lang="sc8b-r8-d2b" title = "bbox15101321947215;x_wconf72" >< em > Exho. rtation < /em></ span> <spanclass="ocr_line"id="line_3" title = "bbox14382532109310;baseline0.0074515648-3" >< span class = "ocrx_word" dir = "ltr" id = "word_5" lang = "SC8b-R8-D2b" title = "bbox14382531552308;x_wconf84" >< em > T ()</ em ></ span> <spanclass="ocrx_word"dir="ltr"id="word_6"lang="sc8b-r8-d2b" title = "bbox15992601645309;x_wconf89" > A </ span> <spanclass="ocrx_word"dir="ltr"id="word_7"lang="sc8b-r8-d2b" title = "bbox16662581720310;x_wconf92" > N </ span> <div class = "ocr_carea" id = "block_2_2" title = "bbox12253342013599"> <p class = "ocr_par" dir = "ltr" id = "par_2" title = "bbox12253342013548"> <spanclass="ocr_line"id="line_4" title = "bbox12253342013548;baseline-0.0038071066-60" >< span class = "ocrx_word" dir = "ltr" id = "word_11" lang = "SC8b-R8-D2b" title = "bbox12253341563533;x_wconf79" >< em > floly < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_12"lang="sc8b-r8-d2b" title = "bbox16353431930492;x_wconf69" >< em > Life.</ em ></ span> <div class = "ocr_carea" id = "block_3_3" title = "bbox11425502003716"> <p class = "ocr_par" dir = "ltr" id = "par_3" title = "bbox11426252003716"> <spanclass="ocr_line"id="line_5" title = "bbox11426252003716;baseline0.023228804-19" >< span class = "ocrx_word" dir = "ltr" id = "word_14" lang = "SC8b-R8-D2b"

title = "bbox11426291222698;x_wconf21" >< em >. A < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_15"lang="sc8b-r8-d2b" title = "bbox12416251394698;x_wconf83" >< em > Plea < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_16"lang="sc8b-r8-d2b" title = "bbox14406281679701;x_wconf74" >< em > forthe < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_17"lang="sc8b-r8-d2b" title = "bbox17206362003716;x_wconf80" >< em > absolute < /em></ span> <div class = "ocr_carea" id = "block_4_4" title = "bbox120072419391019"> <p class = "ocr_par" dir = "ltr" id = "par_4" title = "bbox120072419391019"> <spanclass="ocr_line"id="line_6" title = "bbox12227241918807;baseline0.018678161-22" >< span class = "ocrx_word" dir = "ltr" id = "word_18" lang = "SC8b-R8-D2b" title = "bbox12227241469807;x_wconf55" >< em > fla ï c (: efficy < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_19"lang="sc8b-r8-d2b" title = "bbox15187281610786;x_wconf25" >< em > dof < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_20"lang="sc8b-r8-d2b" title = "bbox16617361918797;x_wconf82" >< em > Inhcrcnt < /em></ span> <spanclass="ocr_line"id="line_7" title = "bbox12778071874871;baseline0.0050251256-16" >< span class = "ocrx_word" dir = "ltr" id = "word_21" lang = "SC8b-R8-D2b" title = "bbox12778081643871;x_wconf40" >< em >- Rjghteousness < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_22"lang="sc8b-r8-d2b" title = "bbox16768071722859;x_wconf67" >< em > ilf < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_23"lang="sc8b-r8-d2b" title = "bbox17488171874865;x_wconf69" > those </ span> <spanclass="ocr_line"id="line_8" title = "bbox13358721823926;baseline0.020491803-11" >< span class = "ocrx_word" dir = "ltr" id = "word_24" lang = "SC8b-R8-D2b" title = "bbox13358721426917;x_wconf87" > that </ span> <spanclass="ocrx_word"dir="ltr"id="word_25"lang="sc8b-r8-d2b" title = "bbox14458731554926;x_wconf77" > hope </ span> <spanclass="ocrx_word"dir="ltr"id="word_26"lang="sc8b-r8-d2b" title = "bbox15708871613918;x_wconf87" > to </ span> <spanclass="ocrx_word"dir="ltr"id="word_27"lang="sc8b-r8-d2b" title = "bbox16318731678920;x_wconf77" >< em > be < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_28"lang="sc8b-r8-d2b" title = "bbox16948771823924;x_wconf78" >< em > saved.</ em ></ span> </span> <spanclass="ocr_line"id="line_9" title = "bbox14069281454958;baseline0.083333333-4" >< span class = "ocrx_word" dir = "ltr" id = "word_29" lang = "SC8b-R8-D2b" title = "bbox14069281454958;x_wconf61" >< em > Z -</ em ></ span> <spanclass="ocr_line"id="line_10" title = "bbox120092819391019;baseline0.01082544-19" >< span class = "ocrx_word" dir = "ltr" id = "word_30" lang = "SC8b-R8-D2b" title = "bbox120095812621014;x_wconf89" >< em > By < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_31"lang="sc8b-r8-d2b" title = "bbox128792816621019;x_wconf48" >< em > YT 'o.jlwqdsworth-j</em></span> <spanclass="ocrx_word"dir="ltr"id="word_32"lang="sc8b-r8-d2b" title = "bbox168396318781009;x_wconf74" >< em > preacher < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_33"lang="sc8b-r8-d2b" title = "bbox189397919391011;x_wconf89" > to </ span> <div class = "ocr_carea" id = "block_5_5" title = "bbox1291101318781128"> <p class = "ocr_par" dir = "ltr" id = "par_5" title = "bbox1291101318781128">

<spanclass="ocr_line"id="line_11" title = "bbox1291101318781074;baseline0.0085178876-19" >< span class = "ocrx_word" dir = "ltr" id = "word_34" lang = "SC8b-R8-D2b" title = "bbox1291101413591059;x_wconf86" >< em > che < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_35"lang="sc8b-r8-d2b" title = "bbox1385101315501057;x_wconf38" >< em > Glyn - reb < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_36"lang="sc8b-r8-d2b" title = "bbox1565102916041058;x_wconf83" > at </ span> <spanclass="ocrx_word"dir="ltr"id="word_37"lang="sc8b-r8-d2b" title = "bbox1622101818781074;x_wconf67" >< em > Newington -</ em ></ span> <spanclass="ocr_line"id="line_12" title = "bbox1412107117481128;baseline0.020833333-20" >< span class = "ocrx_word" dir = "ltr" id = "word_38" lang = "SC8b-R8-D2b" title = "bbox1412107315291110;x_wconf81" >< em > Butts < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_39"lang="sc8b-r8-d2b" title = "bbox1542107115821113;x_wconf80" >< em > in < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_40"lang="sc8b-r8-d2b" title = "bbox1597107617481128;x_wconf73" >< em > Surrey.</ em ></ span> <div class = "ocr_carea" id = "block_6_6" title = "bbox1165115319111170"> <p class = "ocr_par" dir = "ltr" id = "par_6" title = "bbox1165115319111170"> <spanclass="ocr_line"id="line_13" title = "bbox1165115319111170;baseline0672" > </ span> <div class = "ocr_carea" id = "block_7_7" title = "bbox1917116620031176"> <p class = "ocr_par" dir = "ltr" id = "par_7" title = "bbox1917116620031176"> <spanclass="ocr_line"id="line_14" title = "bbox1917116620031176;baseline0.023255814-2" > </ span> <div class = "ocr_carea" id = "block_8_8" title = "bbox1155120020031410"> <p class = "ocr_par" dir = "ltr" id = "par_8" title = "bbox1155120020031410"> <spanclass="ocr_line"id="line_15" title = "bbox1432120017051260;baseline-0.0036630037-13" >< span class = "ocrx_word" dir = "ltr" id = "word_43" lang = "SC8b-R8-D2b" title = "bbox1432120015431247;x_wconf81" >< em > Heb.</ em ></ span> <spanclass="ocrx_word"dir="ltr"id="word_44"lang="sc8b-r8-d2b" title = "bbox1562122115771247;x_wconf76" > r </ span> <spanclass="ocrx_word"dir="ltr"id="word_45"lang="sc8b-r8-d2b" title = "bbox1590121816261245;x_wconf88" >< em > 2. < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_46"lang="sc8b-r8-d2b" title = "bbox1640122116541246;x_wconf82" > 1 </ span> <spanclass="ocrx_word"dir="ltr"id="word_47"lang="sc8b-r8-d2b" title = "bbox1665122417051260;x_wconf65" >< em > 4. < /em></ span> <spanclass="ocr_line"id="line_16" title = "bbox1155125920031322;baseline0.014150943-27" >< span class = "ocrx_word" dir = "ltr" id = "word_48" lang = "SC8b-R8-D2b" title = "bbox1155125912851298;x_wconf74" >< em > Follow < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_49"lang="sc8b-r8-d2b" title = "bbox1293127014001311;x_wconf78" >< em > peace < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_50"lang="sc8b-r8-d2b" title = "bbox1413126015101300;x_wconf79" >< em > with < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_51"lang="sc8b-r8-d2b" title = "bbox1524127815461299;x_wconf77" >< em > a < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_52"lang="sc8b-r8-d2b" title = "bbox1562126515761300;x_wconf84" >< em > l < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_53"lang="sc8b-r8-d2b" title = "bbox1591127716721301;x_wconf77" >< em > men < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_54"lang="sc8b-r8-d2b" title = "bbox1685129416981312;x_wconf83" > '</span>

<spanclass="ocrx_word"dir="ltr"id="word_55"lang="sc8b-r8-d2b" title = "bbox1729126418081305;x_wconf75" >< em > and < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_56"lang="sc8b-r8-d2b" title = "bbox1833126420031322;x_wconf61" >< em > holincss,</ em ></ span> <spanclass="ocr_line"id="line_17" title = "bbox1213131420021376;baseline0.020278834-26" >< span class = "ocrx_word" dir = "ltr" id = "word_57" lang = "SC8b-R8-D2b" title = "bbox1213131413761352;x_wconf65" >< em > vvithout < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_58"lang="sc8b-r8-d2b" title = "bbox1405131515331356;x_wconf69" >< em > which < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_59"lang="sc8b-r8-d2b" title = "bbox1557133316001359;x_wconf78" >< em > no < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_60"lang="sc8b-r8-d2b" title = "bbox1621132018231375;x_wconf68" ><em> m.-mstqall </em></span> <span class = "ocrx_word" dir = "ltr" id = "word_61" lang = "SC8b-R8-D2b" title = "bbox1844132419081376;x_wconf70" ><em> stc </em></span> <span class = "ocrx_word" dir = "ltr" id = "word_62" lang = "SC8b-R8-D2b" title = "bbox1937132520021366;x_wconf72" ><em> tht </em></span> <span class = "ocr_line" id = "line_18" title = "bbox1212137013281410;baseline0.0086206897-2" ><span class = "ocrx_word" dir = "ltr" id = "word_63" lang = "SC8b-R8-D2b" title = "bbox1212137013281410;x_wconf82" ><em> Lord. </em></span> <div class = "ocr_carea" id = "block_9_9" title = "bbox1153146020001466" > <p class = "ocr_par" dir = "ltr" id = "par_9" title = "bbox1153146020001466" > <span class = "ocr_line" id = "line_19" title = "bbox1153146020001466;baseline0376" > <div class = "ocr_carea" id = "block_10_10" title = "bbox1149148621791716" > <p class = "ocr_par" dir = "ltr" id = "par_10" title = "bbox1149148621791675" > <span class = "ocr_line" id = "line_20" title = "bbox1422148621791566;baseline0.0052840159-24" ><span class = "ocrx_word" dir = "ltr" id = "word_65" lang = "SC8b-R8-D2b" title = "bbox1422150414601542;x_wconf87" ><em> L </em></span> <span class = "ocrx_word" dir = "ltr" id = "word_66" lang = "SC8b-R8-D2b" title = "bbox1475150915061546;x_wconf89" ><em> o </em></span> <span class = "ocrx_word" dir = "ltr" id = "word_67" lang = "SC8b-R8-D2b" title = "bbox1518150615681545;x_wconf89" ><em> N </em></span> <span class = "ocrx_word" dir = "ltr" id = "word_68" lang = "SC8b-R8-D2b" title = "bbox1582150716171544;x_wconf89" > D <span class = "ocrx_word" dir = "ltr" id = "word_69" lang = "SC8b-R8-D2b" title = "bbox1633150916651544;x_wconf83" ><em> o </em></span> <span class = "ocrx_word" dir = "ltr" id = "word_70" lang = "SC8b-R8-D2b" title = "bbox1682151017431556;x_wconf87" ><em> N' </em></span> <span class = "ocr_line" id = "line_21" title = "bbox1149155421791615;baseline0.0077669903-15" ><span class = "ocrx_word" dir = "ltr" id = "word_74" lang = "SC8b-R8-D2b" title = "bbox1149155413151600;x_wconf77" ><em> Psïnted </em></span> <span class = "ocrx_word" dir = "ltr" id = "word_75" lang = "SC8b-R8-D2b" title = "bbox1330155513841612;x_wconf86" ><em> by </em></span> <span class = "ocrx_word" dir = "ltr" id = "word_76" lang = "SC8b-R8-D2b" title = "bbox1402156414511601;x_wconf81" ><em> R-. </em></span> <span class = "ocrx_word" dir = "ltr" id = "word_77" lang = "SC8b-R8-D2b" title = "bbox1465156315021601;x_wconf79" ><em> I. </em></span> <span class = "ocrx_word" dir = "ltr" id = "word_78" lang = "SC8b-R8-D2b" title = "bbox1525155815901604;x_wconf83" ><em> for </em></span> <span class = "ocrx_word" dir = "ltr" id = "word_79" lang = "SC8b-R8-D2b" title = "bbox1608156417651603;x_wconf73" ><em> Andrew </em></span> <span class = "ocrx_word" dir = "ltr" id = "word_80" lang = "SC8b-R8-D2b"

title = "bbox1784156319491614;x_wconf75" ><em> Kembc' </em></span> <span class = "ocrx_word" dir = "ltr" id = "word_81" lang = "SC8b-R8-D2b" title = "bbox1962157620031608;x_wconf83" ><em> at </em></span> <span class = "ocr_line" id = "line_22" title = "bbox1163161421791675;baseline0.018700787-25" ><span class = "ocrx_word" dir = "ltr" id = "word_83" lang = "SC8b-R8-D2b" title = "bbox1163161414521672;x_wconf74" ><em> sr.ma,-gare:s </em></span> <span class = "ocrx_word" dir = "ltr" id = "word_84" lang = "SC8b-R8-D2b" title = "bbox1468161515611660;x_wconf79" > Hill <span class = "ocrx_word" dir = "ltr" id = "word_85" lang = "SC8b-R8-D2b" title = "bbox1577161816171660;x_wconf82" ><em> iu </em></span> <span class = "ocrx_word" dir = "ltr" id = "word_86" lang = "SC8b-R8-D2b" title = "bbox1635161919871675;x_wconf73" ><em> Scm-hwark;And </em></span> <div class = "ocr_carea" id = "block_11_11" title = "bbox1195166819531758" > <p class = "ocr_par" dir = "ltr" id = "par_11" title = "bbox1195166819531758" > <span class = "ocr_line" id = "line_23" title = "bbox1195166819531719;baseline0.0092348285-15" ><span class = "ocrx_word" dir = "ltr" id = "word_88" lang = "SC8b-R8-D2b" title = "bbox1195168212461705;x_wconf79" ><em> are </em></span> <span class = "ocrx_word" dir = "ltr" id = "word_89" lang = "SC8b-R8-D2b" title = "bbox1258168212931704;x_wconf85" > to <span class = "ocrx_word" dir = "ltr" id = "word_90" lang = "SC8b-R8-D2b" title = "bbox1303166813601704;x_wconf83" ><em> bee </em></span> <span class = "ocrx_word" dir = "ltr" id = "word_91" lang = "SC8b-R8-D2b" title = "bbox1375166814421705;x_wconf81" ><em> fold </em></span> <span class = "ocrx_word" dir = "ltr" id = "word_92" lang = "SC8b-R8-D2b" title = "bbox1455167315501707;x_wconf87" ><em> under </em></span> <span class = "ocrx_word" dir = "ltr" id = "word_93" lang = "SC8b-R8-D2b" title = "bbox1565167516101708;x_wconf83" ><em> St. </em></span> <span class = "ocrx_word" dir = "ltr" id = "word_94" lang = "SC8b-R8-D2b" title = "bbox1621167517481719;x_wconf64" ><em>,m.:rga. </em></span> <span class = "ocrx_word" dir = "ltr" id = "word_95" lang = "SC8b-R8-D2b" title = "bbox1758168318031709;x_wconf71" ><em> ers </em></span> <span class = "ocrx_word" dir = "ltr" id = "word_96" lang = "SC8b-R8-D2b" title = "bbox1820167619531714;x_wconf70" ><em> Church </em></span> <span class = "ocr_line" id = "line_24" title = "bbox1287171818481758;baseline0.016042781-9" ><span class = "ocrx_word" dir = "ltr" id = "word_97" lang = "SC8b-R8-D2b" title = "bbox1287172713311750;x_wconf72" ><em> on </em></span> <span class = "ocrx_word" dir = "ltr" id = "word_98" lang = "SC8b-R8-D2b" title = "bbox1346171816161755;x_wconf76" ><em> New-Filhstreet </em></span> <span class = "ocrx_word" dir = "ltr" id = "word_99" lang = "SC8b-R8-D2b" title = "bbox1628172117111755;x_wconf81" > Hill. <span class = "ocrx_word" dir = "ltr" id = "word_100" lang = "SC8b-R8-D2b" title = "bbox1744172918481758;x_wconf76" ><em> 166.-). </em></span> <div class = "ocr_carea" id = "block_12_12" title = "bbox0021791842" > <p class = "ocr_par" dir = "ltr" id = "par_12" title = "bbox0021791842" > <span class = "ocr_line" id = "line_25" title = "bbox0021791842;baseline00" > </body> </html>

A.2 De noised hocr Output (emop work_id 32, page1) <? xmlversion = "1.0" encoding = "UTF-8"?> <!DOCTYPEhtmlPUBLIC"-//W3C//DTDXHTML1.0Transitional//EN" "http://www.w3.org/tr/xhtml1/dtd/xhtml1-transitional.dtd"> <html lang = "en" xml:lang = "en" xmlns = "http://www.w3.org/1999/xhtml"> <head> <title> </title> <meta content = "text/html;charset=utf-8" http-equiv = "Content-Type" /> <meta content = "tesseract3.03" name = "ocr-system" /> <meta content = "ocr_pageocr_careaocr_parocr_lineocrx_word" name = "ocr-capabilities" /> </head> <body> <divclass="ocr_page"id="page_1" title = 'image"/dh/data/eebo/e0031/40133/00001.000.001.tif";bbox0021791842;ppageno 0;noisiness0.1386'> <div class = "ocr_carea" id = "block_1_1" title = "bbox121102179310"> <p class = "ocr_par" dir = "ltr" id = "par_1" title = "bbox1211222179310"> <spanclass="ocr_line"id="line_1" title = "bbox1560222179127;baseline-0.0016155089-33" >< span class = "ocrx_word" dir = "ltr" id = "word_1" lang = "SC8b-R8-D2b" title = "bbox156024162594;x_wconf79;pred1;noiseconf0.0096" > A </ span> <spanclass="ocrx_word"id="word_2"lang="sc8b-r8-d2b" title = "bbox2121222179127;x_wconf0;pred0;noiseconf0.9933" ></ span> <spanclass="ocr_line"id="line_2" title = "bbox12111321947215;baseline0.014945652-10" >< span class = "ocrx_word" dir = "ltr" id = "word_3" lang = "SC8b-R8-D2b" title = "bbox12111321471209;x_wconf73;pred1;noiseconf0.0048" >< em > Serious < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_4"lang="sc8b-r8-d2b" title = "bbox15101321947215;x_wconf72;pred1;noiseconf0.0048" >< em > Exho. rtation < /em></ span> <spanclass="ocr_line"id="line_3" title = "bbox14382532109310;baseline0.0074515648-3" >< span class = "ocrx_word" dir = "ltr" id = "word_5" lang = "SC8b-R8-D2b" title = "bbox14382531552308;x_wconf84;pred1;noiseconf0.0024" >< em > T ()</ em ></ span> <spanclass="ocrx_word"dir="ltr"id="word_6"lang="sc8b-r8-d2b" title = "bbox15992601645309;x_wconf89;pred1;noiseconf0.0048" > A </ span> <spanclass="ocrx_word"dir="ltr"id="word_7"lang="sc8b-r8-d2b" title = "bbox16662581720310;x_wconf92;pred1;noiseconf0.0083" > N </ span> <spanclass="ocrx_word"dir="ltr"id="word_8"lang="sc8b-r8-d2b" title = "bbox18343021841308;x_wconf93;pred0;noiseconf0.9953" >.</ span> <spanclass="ocrx_word"dir="ltr"id="word_9"lang="sc8b-r8-d2b" title = "bbox19722751983282;x_wconf69;pred0;noiseconf0.9839" > '</span> <spanclass="ocrx_word"dir="ltr"id="word_10"lang="sc8b-r8-d2b" title = "bbox21062912109293;x_wconf95;pred0;noiseconf0.9500" >-</ span>

<div class = "ocr_carea" id = "block_2_2" title = "bbox12253342013599"> <p class = "ocr_par" dir = "ltr" id = "par_2" title = "bbox12253342013548"> <spanclass="ocr_line"id="line_4" title = "bbox12253342013548;baseline-0.0038071066-60" >< span class = "ocrx_word" dir = "ltr" id = "word_11" lang = "SC8b-R8-D2b" title = "bbox12253341563533;x_wconf79;pred1;noiseconf0.0242" >< em > floly < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_12"lang="sc8b-r8-d2b" title = "bbox16353431930492;x_wconf69;pred1;noiseconf0.0965" >< em > Life.</ em ></ span> <spanclass="ocrx_word"dir="ltr"id="word_13"lang="sc8b-r8-d2b" title = "bbox20083972013403;x_wconf41;pred0;noiseconf0.9894" >< em >-</ em ></ span> <div class = "ocr_carea" id = "block_3_3" title = "bbox11425502003716"> <p class = "ocr_par" dir = "ltr" id = "par_3" title = "bbox11426252003716"> <spanclass="ocr_line"id="line_5" title = "bbox11426252003716;baseline0.023228804-19" >< span class = "ocrx_word" dir = "ltr" id = "word_14" lang = "SC8b-R8-D2b" title = "bbox11426291222698;x_wconf21;pred1;noiseconf0.0883" >< em >. A < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_15"lang="sc8b-r8-d2b" title = "bbox12416251394698;x_wconf83;pred1;noiseconf0.0041" >< em > Plea < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_16"lang="sc8b-r8-d2b" title = "bbox14406281679701;x_wconf74;pred1;noiseconf0.0030" >< em > forthe < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_17"lang="sc8b-r8-d2b" title = "bbox17206362003716;x_wconf80;pred1;noiseconf0.0028" >< em > absolute < /em></ span> <div class = "ocr_carea" id = "block_4_4" title = "bbox120072419391019"> <p class = "ocr_par" dir = "ltr" id = "par_4" title = "bbox120072419391019"> <spanclass="ocr_line"id="line_6" title = "bbox12227241918807;baseline0.018678161-22" >< span class = "ocrx_word" dir = "ltr" id = "word_18" lang = "SC8b-R8-D2b" title = "bbox12227241469807;x_wconf55;pred1;noiseconf0.1122" >< em > fla ï c (: efficy < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_19"lang="sc8b-r8-d2b" title = "bbox15187281610786;x_wconf25;pred1;noiseconf0.0346" >< em > dof < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_20"lang="sc8b-r8-d2b" title = "bbox16617361918797;x_wconf82;pred1;noiseconf0.0011" >< em > Inhcrcnt < /em></ span> <spanclass="ocr_line"id="line_7" title = "bbox12778071874871;baseline0.0050251256-16" >< span class = "ocrx_word" dir = "ltr" id = "word_21" lang = "SC8b-R8-D2b" title = "bbox12778081643871;x_wconf40;pred1;noiseconf0.0420" >< em >- Rjghteousness < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_22"lang="sc8b-r8-d2b" title = "bbox16768071722859;x_wconf67;pred1;noiseconf0.0112" >< em > ilf < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_23"lang="sc8b-r8-d2b" title = "bbox17488171874865;x_wconf69;pred1;noiseconf0.0036" > those </ span> <spanclass="ocr_line"id="line_8" title = "bbox13358721823926;baseline0.020491803-11" >< span

0.0028" 0.0039" 0.0020" 0.0121" 0.0076" 0.4293" 0.0018" 0.0044" 0.0028" 0.0395" class = "ocrx_word" dir = "ltr" id = "word_24" lang = "SC8b-R8-D2b" title = "bbox13358721426917;x_wconf87;pred1;noiseconf0.0021" > that </ span> <spanclass="ocrx_word"dir="ltr"id="word_25"lang="sc8b-r8-d2b" title = "bbox14458731554926;x_wconf77;pred1;noiseconf0.0022" >hope</span> <spanclass="ocrx_word" dir="ltr" id="word_26" lang="sc8b-r8-d2b" title = "bbox15708871613918; x_wconf87; pred1; noiseconf > to </ span> <spanclass="ocrx_word" dir="ltr" id="word_27" lang="sc8b-r8-d2b" title = "bbox16318731678920; x_wconf77; pred1; noiseconf >< em > be < /em></ span> <spanclass="ocrx_word" dir="ltr" id="word_28" lang="sc8b-r8-d2b" title = "bbox16948771823924; x_wconf78; pred1; noiseconf >< em > saved.</ em ></ span> <spanclass="ocr_line" id="line_9" title = "bbox14069281454958; baseline0.083333333-4" >< span class = "ocrx_word" dir = "ltr" id = "word_29" lang = "SC8b-R8-D2b" title = "bbox14069281454958; x_wconf61; pred1; noiseconf >< em > Z -</ em ></ span> <spanclass="ocr_line" id="line_10" title = "bbox120092819391019; baseline0.01082544-19" >< span class = "ocrx_word" dir = "ltr" id = "word_30" lang = "SC8b-R8-D2b" title = "bbox120095812621014; x_wconf89; pred1; noiseconf >< em > By < /em></ span> <spanclass="ocrx_word" dir="ltr" id="word_31" lang="sc8b-r8-d2b" title = "bbox128792816621019; x_wconf48; pred1; noiseconf >< em > YT 'o.jlwqdsworth-j</em></span> <spanclass="ocrx_word" dir="ltr" id="word_32" lang="sc8b-r8-d2b" title = "bbox168396318781009; x_wconf74; pred1; noiseconf >< em > preacher < /em></ span> <spanclass="ocrx_word" dir="ltr" id="word_33" lang="sc8b-r8-d2b" title = "bbox189397919391011; x_wconf89; pred1; noiseconf > to </ span> <div class = "ocr_carea" id = "block_5_5" title = "bbox1291101318781128"> <p class = "ocr_par" dir = "ltr" id = "par_5" title = "bbox1291101318781128"> <spanclass="ocr_line" id="line_11" title = "bbox1291101318781074; baseline0.0085178876-19" >< span class = "ocrx_word" dir = "ltr" id = "word_34" lang = "SC8b-R8-D2b" title = "bbox1291101413591059; x_wconf86; pred1; noiseconf >< em > che < /em></ span> <spanclass="ocrx_word" dir="ltr" id="word_35" lang="sc8b-r8-d2b" title = "bbox1385101315501057; x_wconf38; pred1; noiseconf >< em > Glyn - reb < /em></ span> <spanclass="ocrx_word" dir="ltr" id="word_36" lang="sc8b-r8-d2b"

0.0030" 0.0030" 0.0017" 0.0032" title = "bbox1565102916041058;x_wconf83;pred1;noiseconf > at </ span> <spanclass="ocrx_word" dir="ltr" id="word_37" lang="sc8b-r8-d2b" title = "bbox1622101818781074; x_wconf67; pred1; noiseconf >< em > Newington -</ em ></ span> </span> <spanclass="ocr_line" id="line_12" title = "bbox1412107117481128; baseline0.020833333-20" >< span class = "ocrx_word" dir = "ltr" id = "word_38" lang = "SC8b-R8-D2b" title = "bbox1412107315291110; x_wconf81; pred1; noiseconf >< em > Butts < /em></ span> <spanclass="ocrx_word" dir="ltr" id="word_39" lang="sc8b-r8-d2b" title = "bbox1542107115821113; x_wconf80; pred1; noiseconf >< em > in < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_40"lang="sc8b-r8-d2b" title = "bbox1597107617481128;x_wconf73;pred1;noiseconf0.0022" >< em > Surrey.</ em ></ span> <div class = "ocr_carea" id = "block_6_6" title = "bbox1165115319111170"> <p class = "ocr_par" dir = "ltr" id = "par_6" title = "bbox1165115319111170"> <spanclass="ocr_line"id="line_13" title = "bbox1165115319111170;baseline0672" >< span class = "ocrx_word" dir = "ltr" id = "word_41" lang = "SC8b-R8-D2b" title = "bbox1165115319111170;x_wconf95;pred0;noiseconf0.9500" >< em > < /em></ span> <div class = "ocr_carea" id = "block_7_7" title = "bbox1917116620031176"> <p class = "ocr_par" dir = "ltr" id = "par_7" title = "bbox1917116620031176"> <spanclass="ocr_line"id="line_14" title = "bbox1917116620031176;baseline0.023255814-2" >< span class = "ocrx_word" dir = "ltr" id = "word_42" lang = "SC8b-R8-D2b" title = "bbox1917116620031176;x_wconf73;pred0;noiseconf0.9910" >< em >-.--</ em ></ span> <div class = "ocr_carea" id = "block_8_8" title = "bbox1155120020031410"> <p class = "ocr_par" dir = "ltr" id = "par_8" title = "bbox1155120020031410"> <spanclass="ocr_line"id="line_15" title = "bbox1432120017051260;baseline-0.0036630037-13" >< span class = "ocrx_word" dir = "ltr" id = "word_43" lang = "SC8b-R8-D2b" title = "bbox1432120015431247;x_wconf81;pred1;noiseconf0.0015" >< em > Heb.</ em ></ span> <spanclass="ocrx_word"dir="ltr"id="word_44"lang="sc8b-r8-d2b" title = "bbox1562122115771247;x_wconf76;pred1;noiseconf0.0129" > r </ span> <spanclass="ocrx_word"dir="ltr"id="word_45"lang="sc8b-r8-d2b" title = "bbox1590121816261245;x_wconf88;pred1;noiseconf0.0028" >< em > 2. < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_46"lang="sc8b-r8-d2b" title = "bbox1640122116541246;x_wconf82;pred1;noiseconf0.0115" > 1 </ span> <spanclass="ocrx_word"dir="ltr"id="word_47"lang="sc8b-r8-d2b" title = "bbox1665122417051260;x_wconf65;pred1;noiseconf0.0091" >< em > 4. < /em></ span>

<spanclass="ocr_line"id="line_16" title = "bbox1155125920031322;baseline0.014150943-27" >< span class = "ocrx_word" dir = "ltr" id = "word_48" lang = "SC8b-R8-D2b" title = "bbox1155125912851298;x_wconf74;pred1;noiseconf0.0052" >< em > Follow < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_49"lang="sc8b-r8-d2b" title = "bbox1293127014001311;x_wconf78;pred1;noiseconf0.0021" >< em > peace < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_50"lang="sc8b-r8-d2b" title = "bbox1413126015101300;x_wconf79;pred1;noiseconf0.0018" >< em > with < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_51"lang="sc8b-r8-d2b" title = "bbox1524127815461299;x_wconf77;pred1;noiseconf0.0067" >< em > a < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_52"lang="sc8b-r8-d2b" title = "bbox1562126515761300;x_wconf84;pred1;noiseconf0.0186" >< em > l < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_53"lang="sc8b-r8-d2b" title = "bbox1591127716721301;x_wconf77;pred1;noiseconf0.0038" >< em > men < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_54"lang="sc8b-r8-d2b" title = "bbox1685129416981312;x_wconf83;pred1;noiseconf0.0218" > '</span> <spanclass="ocrx_word"dir="ltr"id="word_55"lang="sc8b-r8-d2b" title = "bbox1729126418081305;x_wconf75;pred1;noiseconf0.0031" >< em > and < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_56"lang="sc8b-r8-d2b" title = "bbox1833126420031322;x_wconf61;pred1;noiseconf0.0126" >< em > holincss,</ em ></ span> <spanclass="ocr_line"id="line_17" title = "bbox1213131420021376;baseline0.020278834-26" >< span class = "ocrx_word" dir = "ltr" id = "word_57" lang = "SC8b-R8-D2b" title = "bbox1213131413761352;x_wconf65;pred1;noiseconf0.0054" >< em > vvithout < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_58"lang="sc8b-r8-d2b" title = "bbox1405131515331356;x_wconf69;pred1;noiseconf0.0031" >< em > which < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_59"lang="sc8b-r8-d2b" title = "bbox1557133316001359;x_wconf78;pred1;noiseconf0.0034" >< em > no < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_60"lang="sc8b-r8-d2b" title = "bbox1621132018231375;x_wconf68;pred1;noiseconf0.0034" >< em > m.- mstqall < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_61"lang="sc8b-r8-d2b" title = "bbox1844132419081376;x_wconf70;pred1;noiseconf0.0073" >< em > stc < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_62"lang="sc8b-r8-d2b" title = "bbox1937132520021366;x_wconf72;pred1;noiseconf0.0070" >< em > tht < /em></ span> <spanclass="ocr_line"id="line_18" title = "bbox1212137013281410;baseline0.0086206897-2" >< span class = "ocrx_word" dir = "ltr" id = "word_63" lang = "SC8b-R8-D2b" title = "bbox1212137013281410;x_wconf82;pred1;noiseconf0.0036" >< em > Lord.</ em ></ span> <div class = "ocr_carea" id = "block_9_9" title = "bbox1153146020001466"> <p class = "ocr_par" dir = "ltr" id = "par_9" title = "bbox1153146020001466"> <spanclass="ocr_line"id="line_19" title = "bbox1153146020001466;baseline0376" >< span class = "ocrx_word" dir = "ltr" id = "word_64" lang = "SC8b-R8-D2b"

title = "bbox1153146020001466;x_wconf95;pred0;noiseconf0.9500" >< em > < /em></ span> <div class = "ocr_carea" id = "block_10_10" title = "bbox1149148621791716"> <p class = "ocr_par" dir = "ltr" id = "par_10" title = "bbox1149148621791675"> <spanclass="ocr_line"id="line_20" title = "bbox1422148621791566;baseline0.0052840159-24" >< span class = "ocrx_word" dir = "ltr" id = "word_65" lang = "SC8b-R8-D2b" title = "bbox1422150414601542;x_wconf87;pred1;noiseconf0.0023" >< em > L < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_66"lang="sc8b-r8-d2b" title = "bbox1475150915061546;x_wconf89;pred1;noiseconf0.0027" >< em > o < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_67"lang="sc8b-r8-d2b" title = "bbox1518150615681545;x_wconf89;pred1;noiseconf0.0017" >< em > N < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_68"lang="sc8b-r8-d2b" title = "bbox1582150716171544;x_wconf89;pred1;noiseconf0.0023" > D </ span> <spanclass="ocrx_word"dir="ltr"id="word_69"lang="sc8b-r8-d2b" title = "bbox1633150916651544;x_wconf83;pred1;noiseconf0.0029" >< em > o < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_70"lang="sc8b-r8-d2b" title = "bbox1682151017431556;x_wconf87;pred1;noiseconf0.0018" >< em > N '</em></span> <spanclass="ocrx_word"dir="ltr"id="word_71"lang="sc8b-r8-d2b" title = "bbox1872155718751563;x_wconf43;pred0;noiseconf0.9630" >.</ span> <spanclass="ocrx_word"dir="ltr"id="word_72"lang="sc8b-r8-d2b" title = "bbox2128151121351520;x_wconf84;pred0;noiseconf0.9521" >-</ span> <spanclass="ocrx_word"id="word_73"lang="sc8b-r8-d2b" title = "bbox2158148621791566;x_wconf0;pred0;noiseconf0.9996" ></ span> </span> <spanclass="ocr_line"id="line_21" title = "bbox1149155421791615;baseline0.0077669903-15" >< span class = "ocrx_word" dir = "ltr" id = "word_74" lang = "SC8b-R8-D2b" title = "bbox1149155413151600;x_wconf77;pred1;noiseconf0.0023" >< em > Ps ï nted < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_75"lang="sc8b-r8-d2b" title = "bbox1330155513841612;x_wconf86;pred1;noiseconf0.0034" >< em > by < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_76"lang="sc8b-r8-d2b" title = "bbox1402156414511601;x_wconf81;pred1;noiseconf0.0021" >< em > R -.</ em ></ span> <spanclass="ocrx_word"dir="ltr"id="word_77"lang="sc8b-r8-d2b" title = "bbox1465156315021601;x_wconf79;pred1;noiseconf0.0030" >< em > I.</ em ></ span> <spanclass="ocrx_word"dir="ltr"id="word_78"lang="sc8b-r8-d2b" title = "bbox1525155815901604;x_wconf83;pred1;noiseconf0.0016" >< em > for < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_79"lang="sc8b-r8-d2b" title = "bbox1608156417651603;x_wconf73;pred1;noiseconf0.0020" >< em > Andrew < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_80"lang="sc8b-r8-d2b" title = "bbox1784156319491614;x_wconf75;pred1;noiseconf0.0028" >< em > Kembc '</em></span> <spanclass="ocrx_word"dir="ltr"id="word_81"lang="sc8b-r8-d2b" title = "bbox1962157620031608;x_wconf83;pred1;noiseconf0.0082" >< em > at < /em></ span> <spanclass="ocrx_word"dir="ltr"id="word_82"lang="sc8b-r8-d2b" title = "bbox2163156921791615;x_wconf54;pred0;noiseconf0.9278"