Computational Linguistics

Similar documents
Identifying Anaphoric and Non- Anaphoric Noun Phrases to Improve Coreference Resolution

Anaphora Resolution in Biomedical Literature: A

Reference Resolution. Regina Barzilay. February 23, 2004

Reference Resolution. Announcements. Last Time. 3/3 first part of the projects Example topics

08 Anaphora resolution

Resolving Direct and Indirect Anaphora for Japanese Definite Noun Phrases

Outline of today s lecture

Anaphora Resolution in Hindi Language

Towards a more consistent and comprehensive evaluation of anaphora resolution algorithms and systems

Anaphora Resolution. Nuno Nobre

Question Answering. CS486 / 686 University of Waterloo Lecture 23: April 1 st, CS486/686 Slides (c) 2014 P. Poupart 1

A Machine Learning Approach to Resolve Event Anaphora

807 - TEXT ANALYTICS. Anaphora resolution: the problem

Coreference Resolution Lecture 15: October 30, Reference Resolution

Hybrid Approach to Pronominal Anaphora Resolution in English Newspaper Text

Anaphora Resolution Exercise: An overview

A Survey on Anaphora Resolution Toolkits

Anaphora Resolution in Biomedical Literature: A Hybrid Approach

ANAPHORIC REFERENCE IN JUSTIN BIEBER S ALBUM BELIEVE ACOUSTIC

Models of Anaphora Processing and the Binding Constraints

An Introduction to Anaphora

Keywords Coreference resolution, anaphora resolution, cataphora, exaphora, annotation.

Automatic Evaluation for Anaphora Resolution in SUPAR system 1

HS01: The Grammar of Anaphora: The Study of Anaphora and Ellipsis An Introduction. Winkler /Konietzko WS06/07

Dialogue structure as a preference in anaphora resolution systems

Discourse Constraints on Anaphora Ling 614 / Phil 615 Sponsored by the Marshall M. Weinberg Fund for Graduate Seminars in Cognitive Science

Statistical anaphora resolution in biomedical texts

Natural Language Processing (NLP) 10/30/02 CS470/670 NLP (10/30/02) 1

AliQAn, Spanish QA System at multilingual

ANAPHORA RESOLUTION IN HINDI LANGUAGE USING GAZETTEER METHOD

Anaphoric Deflationism: Truth and Reference

INFORMATION EXTRACTION AND AD HOC ANAPHORA ANALYSIS

The Reliability of Anaphoric Annotation, Reconsidered: Taking Ambiguity into Account

The UPV at 2007

Palomar & Martnez-Barco the latter being the abbreviating form of the reference to an entity. This paper focuses exclusively on the resolution of anap

WH-Movement. Ling 322 Read Syntax, Ch. 11

Performance Analysis of two Anaphora Resolution System for Hindi Language

ADDIS ABABA UNIVERSITY SCHOOL OF GRADUATE STUDIES. Design of Amharic Anaphora Resolution Model. Temesgen Dawit

7.7. Boekverslag door een scholier 2353 woorden 24 mei keer beoordeeld. Eerste uitgave 1955

University of Groningen. The force of dialectics Glimmerveen, Cornelis Harm

TEXT MINING TECHNIQUES RORY DUTHIE

Who is "him"? Determining Pronominal Reference

Introduction to the Special Issue on Computational Anaphora Resolution

DP: A Detector for Presuppositions in survey questions

Brainstorming exercise

COMMITTEE FOR CORRESPONDENCE WITH CHURCHES ABROAD. ADDITIONAL REPORT TO GENERAL SYNOD, TORONTO, 197^ Esteemed Brethren,

Information Extraction. CS6200 Information Retrieval (and a sort of advertisement for NLP in the spring)

Pronominal, temporal and descriptive anaphora

Natural Language Processing

Presupposition and Rules for Anaphora

Some observations on identity, sameness and comparison

Anaphora Resolution. João Marques

That's Your Evidence?: Using Mechanical Turk To Develop A Computational Account Of Debate And Argumentation In Online Forums

hates the woman [who rejected him i hates the woman [who rejected Peter i ] is hated by him i ] (Langacker 1969: 169) (2) (3) (4a) (4b) (4) a. S b.

Properties as anaphors

ANAPHORA RESOLUTION IN MACHINE TRANSLATION

10. Presuppositions Introduction The Phenomenon Tests for presuppositions

Module 5. Knowledge Representation and Logic (Propositional Logic) Version 2 CSE IIT, Kharagpur

The structure of this lecture. 1. Introduction (coordination vs. subordination) 2. Types of subordinate clauses 3. Functions of subordinate clauses

INTRODUCTION TO THE Holman Christian Standard Bible

What would count as Ibn Sīnā (11th century Persia) having first order logic?

Competition and Disjoint Reference. Norvin Richards, MIT. appear; Richards 1995). The typical inability of pronouns to be locally bound, on this

+ _ + No mortal man can slay every dragon No mortal Dutchman can slay every dragon No mortal man can slay every animal No mortal man can decapitate

Birmingham Theological Seminary 2200 Briarwood Way Birmingham, Alabama COURSE PURPOSE. Objectives of the Course

Lecture 3. I argued in the previous lecture for a relationist solution to Frege's puzzle, one which

Houghton Mifflin English 2001 Houghton Mifflin Company Grade Three Grade Five

CS 671 ICT For Development 19 th Sep 2008

TURCOLOGICA. Herausgegeben von Lars Johanson. Band 98. Harrassowitz Verlag Wiesbaden

Paninian Grammar Based Hindi Dialogue Anaphora Resolution

Exercises Introduction to morphosyntax

The Development of Binding Theory Handout #1

Argument Harvesting Using Chatbots

The Relationship between the Truth Value of Premises and the Truth Value of Conclusions in Deductive Arguments

Van Dale Comprehensive Dutch To English And English To Dutch Dictionary In Four Volumes / Van Dale Grote Woordenboeken Nederlands - Engels / Engels -

Factivity and Presuppositions David Schueler University of Minnesota, Twin Cities LSA Annual Meeting 2013

RECIPIENT ENCODING IN SOUTHERN SELKUP ANJA HARDER, UNIVERSITY OF HAMBURG

Visual Analytics Based Authorship Discrimination Using Gaussian Mixture Models and Self Organising Maps: Application on Quran and Hadith

Front Range Bible Institute

ASSEMBLIES OF GOD THEOLOGICAL SEMINARY BGR 611 INDUCTIVE STUDIES IN THE GREEK NEW TESTAMENT. Professor: James D. Hernando Fall, 2008.

The Interpretation of Complement Anaphora: The Case of The Others

Resolving This-issue Anaphora

Long-distance anaphora: comparing Mandarin Chinese with Iron Range English 1

Annotating Zero Anaphora for Question Answering

Shaping Statically Resolved Indirect Anaphora for Naturalistic Programming

Semantics Semantics is the study of meaning.

Semantics and Pragmatics of NLP DRT: Constructing LFs and Presuppositions

A Computational Model for Resolving Pronominal Anaphora in Turkish Using Hobbs Naïve Algorithm

Article selection and anaphora in the German relative clause Julian Grove and Emily Hanink University of Chicago

Contents. List of contributing authors. Hrisztalina Hrisztova-Gotthardt, Melita Aleksa Varga. Introduction 1 References 6. Neal R.

What is an Argument? Validity vs. Soundess of Arguments

Universitas Saraviensis Project Seminar Text Mining for Historical Documents Antonia Scheidel February An Introduction To Ontologies

145 Philosophy of Science

Anaphora Resolution in Hindi: Issues and Directions

CAS LX 522 Syntax I Fall 2000 November 6, 2000 Paul Hagstrom Week 9: Binding Theory. (8) John likes him.

CONTENTS. Acknowledgment & Dedication...4 Teacher Notes...5 Rules About Sentences...6

An Analysis of Reference in J.K. Rowling s Novel: Harry Potter and the Half-Blood Prince

Solutions for Assignment 1

Induction to the max. Michael Cysouw Philipps-University Marburg

Kai von Fintel (MIT)

The Structural and the Semantic Subject-Object and Referential-Predicative Asymmetries

Transcription:

Computational Linguistics Coreference Resolution Gosse Bouma Information Science University of Groningen LOT 2009 Gosse Bouma 1/29

Overview 1 Coreference 2 Applications 3 Corefence Resolution Pronouns Definite NPs Follow-up Questions Gosse Bouma 2/29

Coreference Noun Phrases A and B coref if they refer to the same entity If A and B coref, and the interpretation of B depends on A, then A is the antecedent and B the anaphor. Anaphors can be pronouns, definite NPs, and Proper Names Xavier Malisse heeft zich geplaatst voor de halve finales. Hij versloeg de Spanjaard Ramirez. In de halve finale treft de Belg een onbekende tegenstander. Xavier Malisse goes to the semi finals. He beat Spaniard Ramirez.In the semi-finals the Belgian meets an unknown opponent. Steve Stevaert dreigt met een regeringscrisis. Stevaert ergert zich aan de manier waarop de verschillende ministeries het dossier naar elkaar toeschuiven. Steve Stevaert threathens with a crisis. Stevaert is annoyed by the way ministries pass the issue to each other Gosse Bouma 3/29

Relation Extraction Extract instances of a given relation from the corpus X is a symptom of Y Een van de symptomen van een Vitamine A deficiëntie is een slecht reuk en/of smaakvermogen One of the symptoms of Vitamin A deficiency is poor ability to smell Blauwtong is een virusziekte die voornamelijk voorkomt bij schapen. Een van de symptomen van de ziekte is de blauwe tong die besmette dieren kunnen krijgen. Blue tongue is a virus disease that occurs with sheep. One of the symptoms ofthe disease is the blue tongue infected animals can get. Gosse Bouma 4/29

Relation Extraction Extract instances of a given relation from the corpus X is a symptom of Y Een van de symptomen van een Vitamine A deficiëntie is een slecht reuk en/of smaakvermogen One of the symptoms of Vitamin A deficiency is poor ability to smell Blauwtong is een virusziekte die voornamelijk voorkomt bij schapen. Een van de symptomen van de ziekte is de blauwe tong die besmette dieren kunnen krijgen. Blue tongue is a virus disease that occurs with sheep. One of the symptoms ofthe disease is the blue tongue infected animals can get. Gosse Bouma 4/29

Question Answering Vraag Wie sneuvelde bij Heiligerlee? Antw Dat leidde tot de slag bij Heiligerlee, waarbij zijn broer Adolf sneuvelde [AD 2003] Antw Adolf trok vervolgens in de troepenmacht van zijn andere broer Lodewijk mee naar het noorden, waar hij sneuvelde bij Heiligerlee. [wikipedia] Que Who died at Heiligerlee? Ans This lead to the battle of Heiligerlee, where his brother Adolf died Ans Adolf moved north with the army of his brother Louis, where he died at Heiligerlee Gosse Bouma 5/29

Follow-Up questions Who is the murderer of John Lennon? 10 may - 1955 Mark David Chapman, murderer of John Lennon How often was he hit? he John Lennon Lennon was hit four times and died at 11.15 pm. Where was he murdered? he John Lennon John Lennon On All Music Guide Gosse Bouma 6/29

Follow-Up questions Who is the murderer of John Lennon? 10 may - 1955 Mark David Chapman, murderer of John Lennon How often was he hit? he John Lennon Lennon was hit four times and died at 11.15 pm. Where was he murdered? he John Lennon John Lennon On All Music Guide Gosse Bouma 6/29

Follow-Up questions Who is the murderer of John Lennon? 10 may - 1955 Mark David Chapman, murderer of John Lennon How often was he hit? he John Lennon Lennon was hit four times and died at 11.15 pm. Where was he murdered? he John Lennon John Lennon On All Music Guide Gosse Bouma 6/29

Follow Up Questions and Anaphora personal and possessive pronouns When was Napoleon born? Which title was introduced by him? Who were his parents? impersonal pronouns What is the KNMI? When was it founded? deictic pronouns What is an ecological footprint? When was this introduced? Gosse Bouma 7/29

Follow Up Questions and Anaphora personal and possessive pronouns When was Napoleon born? Which title was introduced by him? Who were his parents? impersonal pronouns What is the KNMI? When was it founded? deictic pronouns What is an ecological footprint? When was this introduced? Gosse Bouma 7/29

Follow Up Questions and Anaphora personal and possessive pronouns When was Napoleon born? Which title was introduced by him? Who were his parents? impersonal pronouns What is the KNMI? When was it founded? deictic pronouns What is an ecological footprint? When was this introduced? Gosse Bouma 7/29

Follow Up Questions and Anaphora definite NPs Since when is Cuba ruled by Fidel Castro? When was the flag of the country designed? deictic NPs Who lead the Russian Empire during the Russian-Turkish War of 1787-1792? Who won this war? Gosse Bouma 8/29

Follow Up Questions and Anaphora definite NPs Since when is Cuba ruled by Fidel Castro? When was the flag of the country designed? deictic NPs Who lead the Russian Empire during the Russian-Turkish War of 1787-1792? Who won this war? Gosse Bouma 8/29

Resolving Coreference Position number of sentences or NPS between antecedent and anaphor Lexical, morphology pronoun, proper noun, definiteness, number Syntax SUBJ, OBJ, PREDC, APP,... String-matching Semantic gender, category of proper names, synonyms, hypernyms,... Gosse Bouma 9/29

Resolving Pronouns (Bouma and Bouma) Experimental Framework Mur (2008) for PN and full NPs: Rule-based, string matching resolution of PNs Manually weighted linear model (cf Shallom & Lappin, 1997) for definite full NPs determination of anaphoricity of all NPs rule-based. Runs on top of the Alpino parser Gosse Bouma 10/29

Experimental framework - pronoun module Maximum Entropy ranking model for pronoun resolution P(ant pron) = 1 Z exp( f Feats w f f (pron, ant) ) Maxent models trained with TADM (Malouf, 2003), on the last mention of each compatible referent in the last 10 sentences. Several gaussian priors. During application we pick the most likely candidate, provided it is better than a set threshold. Gosse Bouma 11/29

Experimental framework - pronoun module 14 features, capturing: GF of candidate NP form of candidate Ontological status of candidate Distance between pronoun and candidate Frequency of mention candidate referent Gosse Bouma 12/29

Evaluation syntactic features 10-fold cross-validation on KNACK training corpus (Hoste & De Pauw, 2006) Model Opts Prc Pron MUC F MUC P MUC R nearest 31.0 42.7 49.4 37.6 sub & sent.5 61.1 50.3 59.8 43.4 syntax 1,.5 61.3 51.8 60.1 45.5 Cf Hendrickx et al (2008): 51.3 MUC F on this corpus. Gosse Bouma 13/29

Plausibility Gosse =g sneed een stuk parfait =t af en legde het mes =m weg. Het =? smaakte hem =g heerlijk! Frequency based predicate-argument association: parfait su smaak vs mes su smaak Dagan et al. 1995: small improvement Kehler et al. 2004: no improvement Yang et al. 2005: some improvement esp with web as corpus Gosse Bouma 14/29

Predicate-argument frequencies Predicate-argument cooccurrence info from automatically parsed TwNC & Wikipedia (>525mln words). 37mln sub-verb pairs, 18mln obj-verb pairs. Association between pred-arg Pointwise MI: PMI(pred, arg) = log P(pred, arg) P(pred) P(arg) After frequency filtering, about 2mln types (sub&obj1) The value of the association feature for a candidate antecedent is the largest PMI of the coreferents. Gosse Bouma 15/29

Evaluation MI Model Opts Prc Pron MUC F MUC P MUC R nearest 31.0 42.7 49.4 37.6 sub & sent.5 61.1 50.3 59.8 43.4 syntax 1,.5 61.3 51.8 60.1 45.5 MI 1,.5 61.2 51.8 60.1 45.6 Gosse Bouma 16/29

Overcoming sparseness with similar words? Kehler et al. (2004) & Yang et al. (2005) identify data sparseness as a problem. Gosse =g sneed een stuk parfait =t af en legde het mes =m weg. Het =? smaakte hem =g heerlijk! parfait su smaak ijs su smaak cake su smaak gebak su smaak taart su smaak Gosse Bouma 17/29

Overcoming sparseness with similar words? Word similarity can be calculated with pred-arg frequencies, too (Bouma & vd Plas, 2005). Our approach: for each noun, form a vector of PMIs with predicates use DICE to calculate the similarity between vectors pick the 15 most similar words to form a cluster association feature is now the maximum of maximum Gosse Bouma 18/29

Evaluation MI Similar Words Model Opts Prc Pron MUC F MUC P MUC R nearest 31.0 42.7 49.4 37.6 sub & sent.5 61.1 50.3 59.8 43.4 syntax 1,.5 61.3 51.8 60.1 45.5 MI 1,.5 61.2 51.8 60.1 45.6 MI Sim 1,.5 59.7 51.4 59.6 45.2 Gosse Bouma 19/29

Model inspection subject 0.1036 direct object -0.0066 indirect object 0.0062 oblique complement -0.0058 head noun 0.0862 sentence -0.4694 human 0.0002 pronoun 0.0353 definite -0.0693 same paragraph 0.1083 preceding sentence 0.0111 mentions 0.0762 mi(su) 0.0575 mi(obj1) 0.0174 Gosse Bouma 20/29

Discussion Small improvement over baseline with syntactic information. No noticeable effect of PMI as association information. More similar words? Another way of calculating a combined association score? Learn MI from other data (i.e. coref-corpus = Flemish, MI-corpus = Dutch) Gosse Bouma 21/29

Resolving Definite NP anaphors Definite NP often mentions the semantic class of the antecedent Todd Martin was the opponent of the quiet Ivanisevic in December 1995. The American, who defeated the local hero Boris Becker a day earlier, was beaten by the 26-year old Croatian during the finals of the Grand Slam Cup in 1995 Relevant knowledge can be found in apposition relations Gosse Bouma 22/29

Acquiring ISA relations Corpus is searched exhaustively for Concept, app, Instance museum Hermitage, Madame Tussaud, National Gallery,...(1.945) bondscoach Guus Hiddink, Jorge Valdano, Louis van Gaal,... (14K) Argentinian newspaper La Nación, biologist Lilian Ramos, supermarket Disco,... (1.861) 3.2M appositions extracted for 660K Named Entities Gosse Bouma 23/29

Using Anaphora Resolution in IE Increase in # of extracted facts original anaphora age 17.038 20.119 born_date 1.941 2.034 born_loc 753 891 died_age 847 885 died_date 892 1.061 died_how 1.470 1.886 died_loc 642 646 Gosse Bouma 24/29

Using Anaphora Resolution in IE Accuracy on 400 random coref. facts # facts new facts(corr.) 168 new facts(incorr.) 128 increase in frequency(corr.) 91 increase in frequency(incorr.) 6 Gosse Bouma 25/29

Anaphora Resolution in Questions Antecedent has to be a named entity From first question or the answer to the first question What is the capital of Russia? Moscow How many inhabitants does it have 8 million Gosse Bouma 26/29

Anaphora Resolution Results Questions 200 Qs with Anaphor 56 100% Correct Antecedent 29 52% Wrong Antecedent 15 27% Missed 12 21% Gosse Bouma 27/29

Problematic Cases Antecedent is not a named entity Wat is mede? Hoe heet het in India? Locative and Temporal Anaphora Hoe groot is Pitcairn? Welke talen worden er gesproken? Wanneer werd Contra-Aquincum gesticht? 294 Welke keizer was destijds aan de macht? Gosse Bouma 28/29

Problematic Cases Antecedent is not a named entity Wat is mede? Hoe heet het in India? Locative and Temporal Anaphora Hoe groot is Pitcairn? Welke talen worden er gesproken? Wanneer werd Contra-Aquincum gesticht? 294 Welke keizer was destijds aan de macht? Gosse Bouma 28/29

Problematic Cases Bridging? In welke gemeente ligt Helvoirt? Hoe heet het jaarlijkse evenement rond Hemelvaartsdag? Wanneer werd de Efteling geopend? Welke nieuwe attractie werd geopend in 1993? Gosse Bouma 29/29