Greta Franzini and Marco Büchler 25 January 2017

Similar documents
NOTE: THIS IS A PRE-PRINT DRAFT VERSION OF: The Tesserae Project: intertextual analysis of Latin poetry published in Literary and Linguistic

Department of Classics

AGE OF AUGUSTUS: GRS 315

ELA CCSS Grade Five. Fifth Grade Reading Standards for Literature (RL)

Dipartimento di Civiltà e forme del sapere

Strand 1: Reading Process

2015 UNIVERSITY OF NOTRE DAME

Parkes, A. (2017) Tertullian: 'The father of Christian Latin' or not? Rosetta Special Edition: CAHA Colloquium Extended Abstracts: 1-4

Visual Analytics Based Authorship Discrimination Using Gaussian Mixture Models and Self Organising Maps: Application on Quran and Hadith

Question Answering. CS486 / 686 University of Waterloo Lecture 23: April 1 st, CS486/686 Slides (c) 2014 P. Poupart 1

LIBR : Annotated Bibliography of Primary Sources. Betty Radice, trans. The Letters of the Younger Pliny (New York: Penguin Classics, 1963).

Unit Outline Time Content Classical Strategies/ Instruction

College of Arts and Sciences

Stoicism. Traditions and Transformations

Mondays periods 7:30-9:30pm (online) Professor Jennifer A. Rea (

James Alexander Caprio Capreedy, PhD

STI 2018 Conference Proceedings

English Language Arts: Grade 5

Anaphora Resolution in Biomedical Literature: A

A Correlation of. To the. Language Arts Florida Standards (LAFS) Grade 5

Curriculum Vitae RAYMOND MARKS

21H.402 The Making of a Roman Emperor Fall 2005

Ut per litteras apostolicas... Papal Letters

Professor Edward Watts Humanities 2 HUMANITIES 2 SYLLABUS

South Carolina English Language Arts / Houghton Mifflin Reading 2005 Grade Three

JUST1FICATION IN EARLIER MEDIEVAL THEOLOGY

A Correlation of. To the. Language Arts Florida Standards (LAFS) Grade 4

Stoicism. Traditions and Transformations

ELA CCSS Grade Three. Third Grade Reading Standards for Literature (RL)

Coimisiún na Scrúduithe Stáit State Examinations Commission

IV) THE ROMAN EMPIRE

Prof. Joseph McAlhany! WOOD HALL 230 OFFICE HOURS: TR 2-3 & by appt.

Chapter 5 Fill-in Notes: The Roman Empire

OCR A Level Classics. H038 and H438: Information for OCR centres transferring to new specifications for first teaching in 2008

Literature and Society in the Fourth Century AD

Dreams Of Augustus: The Story Of The Roman Empire By Andrew Lantz READ ONLINE

Identifying Anaphoric and Non- Anaphoric Noun Phrases to Improve Coreference Resolution

THE MEDIEVAL DISCOVERY OF NATURE

Strand 1: Reading Process

PETER WHITE. University of Chicago Chicago, IL East 59th St. (773) Chicago, IL (773)

INTRODUCTION TO LOGIC 1 Sets, Relations, and Arguments

Ancient Rome and the Rise of Christianity (509 B.C. A.D. 476)

When Our World Became Christian, Paul Veyne

Macmillan/McGraw-Hill SCIENCE: A CLOSER LOOK 2011, Grade 1 Correlated with Common Core State Standards, Grade 1

Pre U Latin 9788 Resource List Version 1

THE CODEX CAVENSIS DANILA SCRIPTOR NEW LIGHT ON ITS LATER HISTORY

INTRODUCION TO THE HEBREW BIBLE

Latin Alive! Book 2 Yearlong

PAGE(S) WHERE TAUGHT (If submission is not text, cite appropriate resource(s))

Latin Advanced Placement Vergil Summer Assignment

SENSORY PERCEPTION IN THE MEDIEVAL WEST UTRECHT STUDIES IN MEDIEVAL LITERACY

The Aeneid (Vintage Classics) By Virgil READ ONLINE

Houghton Mifflin English 2001 Houghton Mifflin Company Grade Three Grade Five

UNIT 5: Christianity, Islam, and the Crusades

Home work. Answer in complete sentences Use your study sheet to find the correct answers. A NEW POWER RISES

CLAS 3720: HISTORY AND ART OF ANCIENT ROME MAYMESTER Chiara Sulprizio (Classical and Mediterranean Studies)

Shakespeare s Rome Political Science 141 Classics 91/191 Tufts University Fall Semester 2014

SOL 6 - WHI. The Romans

Analyzing the activities of visitors of the Leiden Ranking website

Day, R. (2012) Gillian Clark, Late Antiquity: A Very Short Introduction. Oxford, Oxford University Press, 2011.

Virgil's Eclogues By Virgil, Len Krisak READ ONLINE

Past Course Offerings in Ancient Mediterranean Studies

Life Applications - F

June 2, 2014 Institut national des sciences appliquées de Lyon

Early Christianity (43.200)

CLASSICS. Distinction. Special Programs. Overview of the Majors. Recommendations for Graduate Study. Classics 1

South Carolina English Language Arts / Houghton Mifflin English Grade Three

Clarifying Angelo Mai s Use of Chemicals in Handling Latin Palimpsests

Latin Pseudepigraphic Literature in Medieval Period

WHERE WAS ROME FOUNDED?

Emory Course of Study School COS 222 Theological Heritage II: Early Church

cci 212 spring 18 upon successful completion of this course students will be able to:

Department of Classical Studies CS 3904G: The Life and Legacy of Julius Caesar Course Outline

ARCHAEOLOGY OF ROME S PROVINCES

Middle Ages: The Reign of Religion. The Dark Ages-truly anything but dark!!

The Impact of Oath Writing Style on Stylometric Features and Machine Learning Classifiers

Prentice Hall United States History Survey Edition 2013

Rome s Beginnings. Chapter 8, Section 1. Etruscans. (Pages )

Emory Course of Study School COS 322 Theological Heritage III: Medieval through the Reformation

FAX (610) CEDAR CREST COLLEGE REL Introduction to Religion and Culture Fall 2009 T, R 2:30-3:45 p.m.

Karsten Friis-Jensen in memoriam by Marianne Pade

Ancient Rome. Timeline Cards

Macmillan/McGraw-Hill SCIENCE: A CLOSER LOOK 2011, Grade 3 Correlated with Common Core State Standards, Grade 3

The Decline and Fall of the Roman Empire

The Æneid Of Virgil By John Conington, Virgil

Higley Unified School District Social Studies Grade 6 Revised Aug Fourth Nine Weeks. Middle Ages (Two to Three Weeks)

21H.302 The Ancient World: Rome Spring 2005

A Correlation of. To the. Language Arts Florida Standards (LAFS) Grade 3

Prentice Hall Literature: Timeless Voices, Timeless Themes, Silver Level '2002 Correlated to: Oregon Language Arts Content Standards (Grade 8)

Prentice Hall Literature: Timeless Voices, Timeless Themes, Bronze Level '2002 Correlated to: Oregon Language Arts Content Standards (Grade 7)

Adlai E. Stevenson High School Course Description

Al-Sijistani s and Maimonides s Double Negation Theology Explained by Constructive Logic

Topics in History: France in the Age of Louis XIV and Enlightenment HIST 3110: Winter 2015 Department of History, University of Manitoba

World History Topic 6: Ancient Rome

Saint Bartholomew School Third Grade Curriculum Guide. Language Arts. Writing

Also by Cyril Hovorun: From Antioch to Xi An: An Evolution of Nestorianism. Reading the Gospels with the Early Church: A Guide (contributing editor)

College and Career Readiness Anchor Standards for Reading. Step Into the Time 36 Step Into the Place 92, 108, 174, 292, 430

MWF 9:30-10:20 Office Hrs. M 2:30-3:30;

AH/RL/HS 253 FROM PONTIUS PILATE TO THEODOSIUS: THE ADVENT OF CHRISTIANITY IES Abroad Rome

The Legacy of Rome in the Modern World

Transcription:

LATIN TEXT REUSE DETECTION AT SCALE AN AUTOMATIC TEXT REUSE INVESTIGATION INTO THE FIRST CHRISTIAN HISTORY OF ROME Greta Franzini and Marco Büchler 25 January 2017

TABLE OF CONTENTS 1. Introduction 2. Research questions 3. Challenges 4. Methodology 5. Results 6. Research value and output 2/24

INTRODUCTION

PAULUS OROSIUS AND HIS HISTORIES Paulus Orosius [ca. AD 375-418] Roman historian and a Christian from Spain; Student of St Augustine [AD 354-430]. Historiae adversus Paganos = Histories against the Pagans* First Christian history of Rome; Complementary to St Augustine s De civitate Dei contra Paganos; Defense against pagan accusations that Rome s was decline caused by the advent of Christianity; Heavily reuses both pagan and Christian authors to reject pagan claims. *Paganism = pantheism, polytheism, non-christian. *Christianity = monotheism. Declared permitted religion by Constantine the Great in 313 (Edict of Milan); declared official religion of the Empire by son Constantius II in 350. 4/24

PRIMARY SOURCES 1. [ed.] [tr.] Arnaud-Lindet, M. P., Orose: Histoires contre les païens, 3 vols, Collection des Universités de France, Paris: Les Belles Lettres, 1990 1991. 2. [ed.] Zangemeister, K., Pauli Orosii historiarum adversum paganos libri VII; accedit eiusdem, Liber apologeticus, Corpus Scriptorum Ecclesiasticorum Latinorum 5, Vienna, 1882. Internet Archive: https://goo.gl/snjjhy 3. [ed.] Migne, J. P., Pauli Orosii Hispanorum Chronologorum Opera Omnia, Patrologia Latina Cursus Completus 31, Paris, 1846. Internet Archive: https://goo.gl/awrp8i 4. [ed.] Zangemeister, K., Pauli Orosii historiarum adversum paganos libri VII, Bibliotheca scriptorum Graecorum et Romanorum Teubneriana, Leipzig: Teubner, 1889. Internet Archive: https://archive.org/details/pavliorosiihist01orosgoog Attalus.org: http://www.attalus.org/latin/orosius.html 5/24

RESEARCH QUESTIONS

RESEARCH QUESTIONS: BRIDGING CLOSE & DISTANT READING Close Reading How does Orosius reuse text in order to build his defense? Close + Distant Reading Can we quantify and categorise Orosius reuse diversity (taxonomy)? Distant Reading How does a large corpus affect automatic text reuse detection and its performance? 7/24

CHALLENGES

CHALLENGES: DIACHRONIC CORPUS 9/24

CHALLENGES: REUSE DIVERSITY Orosius: reuses two words to entire sentences or even paragraphs; quotes word-for-word (i.e. verbatim), near-verbatim or (very) loosely; doesn t always cite the original author; occasionally misattributes words because citing from memory; reuses text that doesn t survive. Nec tibi cura canum fuerit postrema (Georg. 3.404 - poetry) [= Nor be your dogs last cared for] non est tamen canum cura postrema (Oros. 1.1. - prose) [= Dogs are not to be cared for last] 10/24

METHODOLOGY

METHODOLOGY: TRACER VS. COMMENTARIES How do the computed results compare to existing scholarship? Has TRACER identified reuses that existing scholarship hasn t? Has existing scholarship identified reuses that TRACER hasn t? 12/24

RESULTS

RESULTS: TRACER ON ENTIRE CORPUS Window size: 10 words; Feature density: 0.8; Highest reuse overlap: 4 words; Computation: ca. 48 hours. 50 45 40 Ratio of the the scored overlap (in %) 35 30 25 20 15 10 5 0 1 2 3 4 5 6 7 8 9 10 Scored overlap with a moving window of 10 Figure 1: Orosius general reuse pattern, across the entire corpus. 14/24

RESULTS: OROSIUS REUSE OF TACITUS HISTORIAE Reuses documented in primary sources (precision): 15 (= 12+3?) Reuses identified by TRACER (recall)*: 55 verbatim; near-verbatim: true and false (spelling conventions); no similarity: why? Synonym replacement? PoS? Feature density? Figure 2: Orosius 1.10, Tacitus 5.3. *Detection parameters: moving window of 15 words; 0.8 feat. density; synonym replacement. Comparing 51,417 (T) against 74,929 words 15/24 (OR). Computation: ca. 1 hour.

RESEARCH VALUE AND OUTPUT

RESEARCH VALUE AND OUTPUT Research contribution Better understanding of Orosius reuse behaviour. Detection strategy refinement; max extraction with min algorithms. Better understanding of degree of influence of (noisy) text on computed results. Refinement of existing linguistic resources towards Gold Standard for Latin lemmatisation + PoS-tagging. Research data-sets Reuse pairs manually and computationally identified in Orosius. The cleanest.txt corpus. PoS-tagged+lemmatised corpus. 17/24

CONTACT Visit us http://www.etrap.eu contact@etrap.eu Stealing from one is plagiarism, stealing from many is research (Wilson Mitzner, 1876-1933) 18/24

ACKNOWLEDGMENTS The authors wish to thank Marco Passarotti and Paolo Ruffolo (CIRCSE) for their continuous support in parsing the corpus with LemLat. 19/24

LITERATURE

LITERATURE Büchler, M. TRACER: Text Reuse Detection Machine. At: http://www.etrap.eu/research/tracer/ Jänicke, S., Franzini, G., Faisal, C., Scheuermann, G. (2015) On Close and Distant Reading in Digital Humanities: A Survey and Future Challenges. A State-of-the-Art (STAR) Report, In: (Proceedings) EuroVis 2015: The EG/VGTC Conference on Visualization. Cagliari, May 2015, 25-29. DOI: 10.2312/eurovisstar.20151113 LemLat 3.0: Morphological Analyser and Lemmatiser for Latin. CNR-ILS, UCSC-CIRCSE, Italy. At: http://www.lemlat3.eu/ Minozzi, S. (2009) The Latin WordNet Project, Innsbrucker Beiträge zur Sprachwissenschaft, vol. 137, pp. 707 716. Institut für Sprachen und Literaturen der Universität Innsbruck, Innsbruck. Passarotti, M. (2004) Development and perspectives of the Latin morphological analyser LEMLAT, in A. Bozzi, L. Cignoni and J. L. Lebrave (eds.) Digital Technology and Philological Disciplines, Linguistica Computazionale, XX-XXI, pp. 397-414. Budassi, M., Passarotti, M. (2016) Nomen Omen. Enhancing the Latin Morphological Analyser Lemlat with an Onomasticon, in Proceedings of the 10th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH 2016), Berlin, Germany, The Association for Computational Linguistics, pp. 90-94. 21/24

MORE INFORMATION

CORPUS STATISTICS Author [date] Work (type) Tokens Types TTR Caesar [100-44BC] De Bello Gallico (prose) 51,723 11,100 4.65 Vergil [70-19 BC] Aeneid (epic poem) 63,715 16,799 3.79 Vergil [70-19 BC] Georgics (epic poem) 14,175 6,974 2.03 Livy [59 BC-17 AD] Ab urbe condita (prose) 507,120 50,774 9.98 Lucan [39-65 AD] De Bello Civili sive Pharsalia (epic 51,033 14,780 3.45 poem) Tacitus [56-117 AD] Historiae (prose) 51,417 15,347 3.35 Suetonius [69-ca.130 AD] De Vitis Caesarum (biography) 71,040 21,565 3.29 Florus [74-ca. 130AD] Epitome de T. Livio Bellorum Omnium 26,750 9,181 2.91 Annorum DCC Libri Duo (prose) *Justin [3rd century] Historiarum Philippicarum T. Pompeii 61,256 15,134 4.04 Trogi Libri XLIV in Epitomen Redacti (prose) Eutropius [n.d.-ca. 399AD] Breviarium ab Urbe Condita (prose) 18,873 5,575 3.38 Augustine [354-430AD] De civitate Dei contra Paganos 274,720 35,430 7.75 (prose) Orosius [385-420 AD] Historia adversum Paganos (prose) 74,929 19,748 3.79 Total tokens (words to be processed): 1,266,751 Table 1: Token-type ratio across the corpus (January 2017). 23/24

LICENCE The theme this presentation is based on is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Changes to the theme are the work of etrap. cba 24/24