Keywords Coreference resolution, anaphora resolution, cataphora, exaphora, annotation.

Similar documents
A Machine Learning Approach to Resolve Event Anaphora

Anaphora Resolution in Hindi Language

ANAPHORA RESOLUTION IN HINDI LANGUAGE USING GAZETTEER METHOD

Performance Analysis of two Anaphora Resolution System for Hindi Language

Identifying Anaphoric and Non- Anaphoric Noun Phrases to Improve Coreference Resolution

ANAPHORIC REFERENCE IN JUSTIN BIEBER S ALBUM BELIEVE ACOUSTIC

08 Anaphora resolution

ह द : 1. सभ म त र ओ स सम ब हदत २-२ शब द ल ख ए 2.प च प ज स ल न

Anaphora Resolution in Hindi: Issues and Directions

DAV CENTENARY PUBLIC SCHOOL, PASCHIM ENCLAVE, NEW DELHI-87 SUMMATIVE ASSESSMENT 2 (SESSION ) CLASS III

Coreference Resolution Lecture 15: October 30, Reference Resolution

सवर न म, ल ग,वचन स य क त र वर म चह न अन च छ द ल खन. English Hindi Mathematics Environmental Science

Bill No. 13 of 2011 THE RAJASTHAN AGRICULTURAL PRODUCE MARKETS (AMENDMENT) BILL, 2011 (To be Introduced in the Rajasthan Legislative Assembly) A Bill

Reference Resolution. Regina Barzilay. February 23, 2004

SCHOOL OF ENGINEERING AND TECHNOLOGY MONAD UNIVERSITY, HAPUR

Bill No. 15 of 2014 THE CONTRACT LABOUR (REGULATION AND ABOLITION) (RAJASTHAN AMENDMENT) BILL, 2014 (To be Introduced in the Rajasthan Legislative

Reference Resolution. Announcements. Last Time. 3/3 first part of the projects Example topics

Hindi. Lesson 8 Skip Counting Lesson 11 Money Lesson -12 Time Addition carry over

Towards a more consistent and comprehensive evaluation of anaphora resolution algorithms and systems

KV Paschim Vihar Winter holiday homework Class I

GURU HARKRISHAN PUBLIC SCHOOL VASANT VIHAR NEW DELHI HOLIDAYS HOME WORK CLASS-III ENGLISH

Hybrid Approach to Pronominal Anaphora Resolution in English Newspaper Text

Mishra English Study Centre. Conjunction ज ड़न व ल. BY Pritam Kumar Raw

DAV PUBLIC SCHOOL,ASHOK VIHAR,PH-IV,DELHI SESSION

च क त स उप रण एव अस पत ल य जन ववभ ग चचककत स उपकरण- आई एस ओ क य ग क ददश तनद श

TIME AND WORK QUESTIONS FOR SSC GD RPF EXAM 2018 TIME AND WORK PDF HINDI 2018

Outline of today s lecture

KENDRIYA VIDYALAYA KHICHRIPUR, DELHI (SHIFT II) HOLIDAY HOMEWORK FOR WINTER BREAK SESSION

Automatic Evaluation for Anaphora Resolution in SUPAR system 1

Paninian Grammar Based Hindi Dialogue Anaphora Resolution

ST.JOSEPH S HIGHER SECONDARY SCHOOL

DELHI PUBLIC SCHOOL NTPC FARAKKA SYLLABUS BREAKUP FOR

Introduction to the Special Issue on Computational Anaphora Resolution

Bill No. 9 of 2011 THE RAJASTHAN TENANCY (AMENDMENT) BILL, 2011 (To be Introduced in the Rajasthan Legislative Assembly) A Bill further to amend the

Application Reference Letter

न य नतम स झ क यक रम सम च र पत र क CMP E-NEWSLETTER BE A GOOD PERSON BUT DON T WASTE TIME TO PROVE IT!

Resolving Direct and Indirect Anaphora for Japanese Definite Noun Phrases

Anaphora Resolution in Biomedical Literature: A

Broadways International School,Sec-76, Gurugram

Anaphora Resolution. Nuno Nobre

DELHI PUBLIC SCHOOL NTPC FARAKKA SYLLABUS BREAKUP FOR

ARMY PUBLIC SCHOOL MEERUT CANTT SYLLABUS FOR UNIT TEST II CLASS VIII,

ÛIm] g]v]t]/ g]it]] य वभ गय ग: Chapter 17 अज र न उव च य श व धम त स ज य यजन त य न वत : त ष न त क क ष ण स वम ह रजस तम: 17-1

A Survey on Anaphora Resolution Toolkits

9 Uncorrected/ Not for Publication

SHARJAH INDIAN SCHOOL

WIT AND HUMOUR, POETRY AND COUPLET to (16 th Session of 16 th Lok Sabha) Date Subject Name of Member/Minister

Dialogue structure as a preference in anaphora resolution systems

An Introduction to Anaphora


Lt. Col. Mehar Little Angels Sr. Sec. School. Lesson 1 (No Smiles Today) Q.1. How do you know that Shanti and Arun were good friends?

KENDRIYA VIDYALAYA VIZIANAGARAM PRIMARY NEWS LETTER

D.A.V PUBLIC SCHOOL (10 +2) PRATAP VIHAR HOLIDAY HOME WORK FOR CLASS- III SESSION- ( ) SUBJECT- ENGLISH

ENGLISH HOLIDAY HOMEWORK Class- VI

CS 671 ICT For Development 19 th Sep 2008

Bill No. 8 of 2015 THE RAJASTHAN AGRICULTURAL PRODUCE MARKETS (AMENDMENT) BILL, 2015 (To be Introduced in the Rajasthan Legislative Assembly) A Bill

NPS INTERNATIONAL SCHOOL, GUWAHATI

WHERE TO with Three Modes of Communication. LOTE Conference NYCDOE Monday, January 31, 2011 Presenter: Sushma Malhotra

vlk/kj.k EXTRAORDINARY Hkkx II [k.m 3 mi&[k.m (ii) PART II Section 3 Sub-section (ii) izkf/dkj ls izdkf'kr PUBLISHED BY AUTHORITY

ÛIm] g]v]t]/ g]it]] म क षस न य सय ग:

S.B.V.M. Inter College,Mahmudabad (Sitapur) (English Medium Branch)

Vikas Bharati Public School Holiday Homework( ) Class-VI

MESSAGE BY I/C HM. A Child Without Education Is Like A Bird Without Wings. Mr. ANIL KUMAR (PRINCIPAL)

Anaphora Resolution Exercise: An overview

J.P. World School, Jammu Syllabus Bifurcation: Class: U.K.G

Discourse Constraints on Anaphora Ling 614 / Phil 615 Sponsored by the Marshall M. Weinberg Fund for Graduate Seminars in Cognitive Science

Anusāraka: Machine Translation and Language Accessor

.. shikshashtaka (chaitanyamahaprabhu).. श क (च त मह भ )

REMAL PUBLIC SCHOOL. Class II ( ) ENGLISH UNIT- 4

Term I. Subject : English (Written)

ÛIm] g]v]t]/ g]it]] म क षस न य सय ग:

ÛIm] g]v]t]/ g]it]] म क षस न य सय ग:

807 - TEXT ANALYTICS. Anaphora resolution: the problem

NISCORT FATHER AGNEL SCOOL, VAISHALI

ÛIm] g]v]t]/ g]it]] म क षस न य सय ग:

Palomar & Martnez-Barco the latter being the abbreviating form of the reference to an entity. This paper focuses exclusively on the resolution of anap

यथ च छ स तथ क र (G ) व Brahma Vidya. Just for the Record

1 Uncorrected/ Not for Publication The House met at eleven of the clock, MR. CHAIRMAN in the Chair ---

HS01: The Grammar of Anaphora: The Study of Anaphora and Ellipsis An Introduction. Winkler /Konietzko WS06/07

स. दक ष व स /अअ.1/4(आर ट ए)/2017/ No. SRPC/SE-1/4(RTA)/2017/ द न क /Dated: 5-June, 2017

ÛIm] g]v]t]/ g]it]] म क षस न य सय ग:

1 Bill No. 23 Of 2010 THE RAJSTHAN TENANCY (AMENDMENT) BILL, 2010 (To be Introduced in the Rajasthan Legislative Assembly) A Bill

CREDENCE HIGH SCHOOL DUBAI Term-Wise Syllabus Grade: 3

Cambridge International Examinations Cambridge International General Certificate of Secondary Education

ÛIm] g]v]t]/ g]it]] म क षस न य सय ग:

क वल य उप नषत. व Brahma Vidya आत म न अर ण क त व णव च उ र रण म ज ञ न नमर थन अभ य स त प श दह त प ण डत: 11

Broadways International School,Sec-76, Gurugram

Kindly note that answers to the above questions is to be done in EVS notebook. ***********************

Syllabus Class - VI. Subject Exam Chapters & Topics

[TO BE PUBLISHED IN THE GAZETTE OF INDIA, EXTRA ORDINARY PART-II, SECTION 3, SUB-SECTION-(i) ]

NATIONAL INSTITUTE OF OPEN SCHOOLING Mukta Vidya Vani and Radio Vahini -Community Radio FM 91.2 MHz

Kendriya Vidyalaya RRC Fatehgarh. Winter Vacations Homework. Class I

ÛIm] g]v]t]/ g]it]] म क षस न य सय ग:

Ahmedabad. Orientation ( ) (Class-V)

TEXT MINING TECHNIQUES RORY DUTHIE

Coordination Problems

Question Answering. CS486 / 686 University of Waterloo Lecture 23: April 1 st, CS486/686 Slides (c) 2014 P. Poupart 1

व षय: क ष त र य प र शण ल ख

Current Affairs Live 6:00 PM Daily

Summer Holiday home work

Transcription:

Volume 5, Issue 7, July 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analysis of Anaphora, Cataphora and Exaphora for Demonstrative Pronoun of Hindi Pardeep Singh *, Kamlesh Dutta CSED, National Institute of Technology Hamirpur, Himachal Pradesh, India Abstract Co-reference resolution is one of those issues of paramount importance that are required be addressed for fool proof translation and other application in which co-reference is used, especially, when target language is a free word order language. A number of algorithms have been devised to resolve co-reference resolution. The accuracy of co-reference resolution (anaphora + cataphora + exaphora) algorithm depends on the percentage of all these types of references in corpus. In order to determine its viability, we have conducted a study on Hindi corpus to find the percentage of sentences of anaphora and cataphora. Percentage of these types of sentences pave the way to evaluate the accuracy of co-reference resolution algorithms which can resolve either anaphora, cataphora or exaphora. This will guide the researchers to evaluate the parameters to consider relevance of these phoric types in study. In this paper we have considered one of the features proposed by Botley, which has been engaged to resolve the anaphora in Hindi. This tag has been analysed empirically and its values tested for a corpus. We analysed 165 news items of Ranchi Express from EMILEE corpus of plain text, wherein we exploited tag set proposed by different authors. Three values are considered of this tag. In this study there are twelve files of Ranchi Express, and have 1515 sentences from monologue. Eight files of dialogue of EMILEE corpus having 877 sentences. It is evident from the study that hundred percent accuracy of co-reference resolution is not attainable until anaphora, cataphora and exaphora is addressed to. Keywords Coreference resolution, anaphora resolution, cataphora, exaphora, annotation. I. INTRODUCTION Anaphora is a referring expression that refers to some entity which have antecedently been introduced into the discourse. The process of binding (mapping) the referring expression to the correct antecedent, in the given discourse, is called anaphora resolution. According to Halliday and Hassan [14], anaphora is the cohesion (presupposition) which points back to some previous item. Hirst, more formally defines anaphora as a device for making an abbreviated reference (containing fewer bits of disambiguating information, rather than being lexically or phonetically shorter) to some entity (or entities) in the expectation that the receiver of the discourse will be able to disabbreviate the reference and, thereby, determining the identity of the entity. Here, reference is an anaphora and the entity to whom it refers is an antecedent. Example 1: S1: The Mangos i were given to the children j because they *i/j were hungry. S2: The Mangos i were given to the children j because they *i/j were ripe. S3: The Mangos i were given to the children j because they *i/j were there. Whom does each of the word they refer to in the example 1? With the hearer set of belief in S1, the pronoun they refer to the children because hungry is the property of they pronoun which matched with children. Only the children can feel hunger. Being the similar property of hunger in the pronoun they and noun children binds together as a pair of referent and referring expression. In the same way S2 and S3 resolves the referent and the referring expression pairs by using some semantic or syntactic knowledge of sentence/ words. The problem arises when it is resolved with the help of a machine. Example 2: John had to go to a meeting so he decided to have a shave. He refers to John and John come before He. John is an anaphor. In free word order language, a number of meaningful sentences can be made by scrambling the words of one sentence. This scrambling will convey the same meaning, however the referring expression may change its position with the corresponding referent. Referring expression found to be on the left side of referent is called anaphora and when it is found placed on the right side of the referent is called cataphora. If referring expression is missing in discourse or implicit, is called exaphora. II. COREFERENCE RESOLUTION Coreference occurs when multiple expressions in a sentence or document refers to the same thing; or in linguistic jargon, they have the same referent. For example, in the sentence: 2015, IJARCSSE All Rights Reserved Page 108

Radha said she would help me. she and Radha are most likely referring to the same person or group, in which case they are co-referent. Similarly, in I saw Raj yesterday. He was fishing by the lake. Raj and he are most likely co-referent. The following is a sample text in Hindi, of the Article 1 of the Universal Declaration of Human Rights (by the United Nations): Example 3: अन च छ द 1 सभ मन ष य i क ग रव और अध क र क म मऱ म जन मज त स वतन रत और सम नत प र प त ह उन ह i ब द ध और अन तर त म क द न प र प त ह और परस पर उन ह i भ ईच र क भ व स बत व करन च हहय Gloss (word-to-word): Article 1 All human-beings i to dignity and rights' matter in from-birth freedom and equality acquired is. Them i to reason and conscience's endowment acquired is and always them i to brotherhood's spirit with behaviour to do should. Translation (grammatical): Article 1 All human beings i are born free and equal in dignity and rights. They i are endowed with reason and conscience and should act towards one another in a spirit of brotherhood. In example 1 unhone (Anaphor) refer to Sabhi manuṣyoṃ (Antecedent). Translation of above sentence have one anaphor; They which refer to All human beings is antecedent. Antecedent in the above example consists of three words which are tough to recover. III. LITERATURE SURVEY Most of the earlier work done in the field of co-reference resolution is for English and other European languages. Reference [1], have done extensive work to propose a computational solution to anaphora and co-reference resolution in English [2]. Hobbs algorithm [1], Mitkov [4]; is an effective algorithm for anaphora resolution. It uses syntactic information rather than semantic information. Reference [1] algorithm depends only on a syntactic parser plus a morphological gender and number checker. For this reason, it is often used as a baseline, when evaluating new pronominal anaphora resolution algorithms. Reference [3] work is very comprehensive in the field of anaphora resolution. JavaRAP [5], and Mitkov s [4] Anaphora Resolution System are among some of the good anaphora resolution systems. A number of annotation schemes are available for different tasks. A number of tag set is defined by different authors [7], [9-11] for English, European languages and modified for other languages like Turkish, German, Dravidian languages etc.; to create an annotated corpus. There are six features proposed to annotate demonstrative pronoun for English language [10]. The author considers the recoverability of antecedent, direction of reference, phoric type, syntactic function, and antecedent type to annotate three genre. These corpora are the American Printing House for the Blind (APHB) Corpus, the Associated Press (AP) Corpus, and the Hansard Corpus [8]. Later, few tags are suggested and adapted the annotation scheme for Hindi [10], [12]. A machine learning approach is proposed for classification of indirect anaphora and added one more tag to previous work [9]. This tag considers the semantic category. The authors proposed that despite some syntactic constraints, semantic collocation pattern is also significant feature for indirect anaphora in Hindi [6]. An annotated corpus by adopting the lexically grounded approach of the Penn Discourse Treebank (PDTB) [10], they presented a preliminary analysis of discourse connectives in a small corpus scheme. Another study was carried out for the analysis of feature of anaphora resolution [13]. IV. METHODOLOGY A. Feature set selection We have used EMILLE corpus. In this corpus each occurrence of demonstrative pronoun is coded in such a manner that it could be extracted. The pronoun marked as a direct or indirect, does not specify what actually distinguishes direct anaphor from the indirect. The corpus is annotated for anaphora using scheme based on [8] and customized for Hindi corpus by [9]. In this study, we have considered one feature (Table 1). The values of the feature (Direction of Reference) may have any one of the three values. The values of this feature are A, C & 0 (zero) for anaphora, cataphora and exaphora respectively. TABLE I FEATURE USED FOR ANALYSIS Feature Value1 Value2 Value3 Value4 Value5 Direction of 0 (not applicable, A (anaphoric) C (cataphoric) None None reference exophoric or deictic) B. Direction of Reference Anaphoric It is directly anaphoric if table II have values like DARMN for any demonstrative pronoun. Our offense is designed to shoot lay-ups. If we can t carry on with this offense, we find ourselves sitting on the bench. TABLE II VALUE OF FEATURES FOR ANAPHORIC REFERENCE Feature Value Tag Recoverability directly recoverable D 2015, IJARCSSE All Rights Reserved Page 109

Direction of reference anaphoric A Phoric type referential R Syntactic function modifier M Antecedent type nominal N C. Direction of Reference Cataphoric Direction of reference is said to be cataphora if antecedent is found placed on the right hand side of the anaphor. 1. If you want them, there are cookies in the kitchen. ( Them is an instance of cataphora because it refers to cookies which haven't been mentioned in the discourse prior to that point.) 2. After he received his orders, the soldier left the barracks. ( he is also a cataphoric reference to the soldier which is mentioned later in the discourse) TABLE III VALUES OF FEATURES FOR CATAPHORIC REFERENCE Feature Value Tag Recoverability directly recoverable D Direction of cataphoric C reference Phoric type referential R Syntactic function head of noun phrase H Antecedent type propositional P D. Non recoverable Antecedent This feature has a value zero if it is non recoverable. Antecedent may not be present in the discourse or its reference is implicit. Did the gardener water those plants? it is quite possible that "those" refers back to the preceding text, to some earlier mention of those particular plants in the discussion. But it is also possible that it refers to the environment in which the dialogue is taking place to the "context of situation", as it is called where the plants in question are present and can be pointed to if necessary. V. RESULT AND DISCUSSION Pre-tagged EMILEE corpus has been considered for study. It was annotated as per Botley s annotation scheme. The values of this tag are extracted and collected under anaphoric, cataphoric and exaphoric heading. TABLE IV. NUMBER OF INSTANCES OF DIRECTION OF REFERENCE IN CORPUS Corpus Anaphoric Cataphoric Exaphoric Plain 1297 169 49 Dialogue 759 42 70 Table IV reveals that the majority of co-reference in Hindi language is anaphoric in both the genre (dialogue and monologue). In plain text anaphoric, cataphoric and exaphoric references are 1297, 169 & 49 respectively. The same pattern is reflected in the corpus of dialogue with values of 759, 42 & 40 as anaphoric, cataphoric and exaphoric respectively. Fig. 1 Number of instances of Direction of reference in Corpus Figure 1 shows the pictorial view of table IV. Corpus shows that all types of references in dialogue are less in number than monologue. However, the pattern of all the values of the feature remains the same. 2015, IJARCSSE All Rights Reserved Page 110

TABLE V PERCENTAGE OF INSTANCES OF DIRECTION OF REFERENCE IN CORPUS Corpus Anaphoric Cataphoric Exaphoric Percentage in plain 85.6 11.1 3.3 corpus Percentage in Dialogue 87.1 4.8 8.1 corpus Table V indicates that the majority of co-reference in the demonstrative pronoun in Hindi language is anaphoric in both genre (dialogue and monologue). It is found that antecedent in demonstrative pronoun can be cataphora but its value is 11 and 5 (approximately) for plain and dialogue respectively. The presence of exaphoric references are very few, however, these are difficult to recover. Fig. 2 Percentage of instances of Direction of reference in Corpus In figure 2, percentage of pronoun feature Direction of Reference has been shown as value of A, C and 0(zero). On axis X, values of the feature and on axis Y, the percentage of pronoun in plain text and dialogue text are shown. Figure 5 shows that 86 and 87(approximately) percentage references are anaphoric in plain and dialogue respectively. It means maximum references are anaphoric. Remaining 11 and 5 % (approximately) references are cataphoric in plain and dialogue respectively. VI. CONCLUSION The majority of the references, i.e, 86 percentage are anaphoric. Cataphoric and exaphoric referent sentences are 11 & 3 percentage respectively in monologue. Whereas, in dialogue corpus it has been observed that anaphoric, cataphoric and exaphoric references are there, which will have a percentage of 87, 4 & 8 respectively. It infers that exaphora has the lowest priority on the basis of its percentage. Any algorithm devised for only anaphora or cataphora cannot attain hundred percent accuracy. Co-reference resolution algorithm must address all types of references i.e. anaphoric, cataphoric and exaphoric. Exaphoric references may be left out up to a certain extent since in both types of corpuses its presence is very thin in terms of percentage. One must address the anaphora and cataphora for a substantial amount of accuracy. REFERENCES [1] R. Mitkov, An integrated model for anaphora resolution, Proceedings of the 15 th conference on Computational linguistics - Volume 2 Kyoto, Japan: Association for Computational Linguistics, 1994. [2] R. Prasad, M. Strube, Discourse Salience and Pronoun Resolution in Hindi, Penn Working Papers in Linguistics, vol 6, issue 3. UPenn, pp. 189-208, 2000. [3] S. P. Botley, Indirect anaphora: Testing the limits of corpus-based linguistics, International Journal of Corpus Linguistics, vol 1, issue 3, pp 73 112, 2006. [4] S. Botley, A. McEnery, Demonstratives in English: a corpus-based study, Journal of English Linguistics, vol. 29, pp. 7 33, 2001. [5] K. Dutta, S. Kaushik, N. Prakash, Machine Learning Approach for the Classification of Demonstrative Pronouns for Indirect Anaphora in Hindi News Items, The Prague Bulletin of Mathematical Linguistics No. 95, pp 33 50, doi: 10.2478/v10108-011-0003-4, 2011. [6] R. Prasaad, E. Miltask, A. Joshi, B. Webber, Annotation and Data Mining of the Penn Discourse Tree Bank, ACL Workshop on Discourse Annotation, 2000. [7] S. Hammami, L. H. Belguith, A. B. Hamadou, Arabic anaphora resolution: corpora annotation with coreferential links, The International Arab Journal of Information Technology - IAJIT, vol. 6, no. 5, pp 480-488, 2009. [8] S. Sinha, A Corpus-based Account of Anaphor Resolution in Hindi, Master s thesis, University of Lancaster, UK, 2002. [9] S. Lappin, and H. Leas, An algorithm for pronominal anaphora resolution, Computational Linguistics, vol 20, issue 4, pp 535-561, 1994. 2015, IJARCSSE All Rights Reserved Page 111

[10] P. Singh, K. Dutta, Analysis and Comparison of Antecedent Type of Demonstrative Pronoun in Context of Coreference Resolution: A Corpus Based Study of Hindi for Monologue and Dialogue, Sixth IEEE International Conference on Computational Intelligence and Communication Networks (CICN 2014), pp 536-540, 14-16 Nov. 2014, DOI 10.1109/.122 537 DOI 10.1109/CICN.2014.122. [11] R. Mitkov, R. Evans and C. Orasan, A new, fully automatic version of Mitkov's knowledge-poor pronoun resolution method, Lecture notes in computer science, issue 2276, pp.168-186, 2002. [12] G. Hirst, Anaphora in Natural Language Understanding, Springer-Verlag, Berlin, 1981. [13] Jerry R. Hobbs, Pronoun Resolution, Research Report 76-1, Department of Computer Sciences, City College, City University of New York. August 1976. [14] M. Halliday and R. Hasan, "Cohesion in English," Longman English Language Series 9, Longman, 1976. 2015, IJARCSSE All Rights Reserved Page 112