Anaphora Resolution in Hindi Language

Similar documents
Performance Analysis of two Anaphora Resolution System for Hindi Language

ANAPHORA RESOLUTION IN HINDI LANGUAGE USING GAZETTEER METHOD

A Machine Learning Approach to Resolve Event Anaphora

Keywords Coreference resolution, anaphora resolution, cataphora, exaphora, annotation.

Identifying Anaphoric and Non- Anaphoric Noun Phrases to Improve Coreference Resolution

Coreference Resolution Lecture 15: October 30, Reference Resolution

08 Anaphora resolution

Reference Resolution. Regina Barzilay. February 23, 2004

Reference Resolution. Announcements. Last Time. 3/3 first part of the projects Example topics

Anaphora Resolution. Nuno Nobre

Outline of today s lecture

Anaphora Resolution in Hindi: Issues and Directions

Hybrid Approach to Pronominal Anaphora Resolution in English Newspaper Text

Anaphora Resolution in Biomedical Literature: A

A Survey on Anaphora Resolution Toolkits

Towards a more consistent and comprehensive evaluation of anaphora resolution algorithms and systems

Dialogue structure as a preference in anaphora resolution systems

ANAPHORIC REFERENCE IN JUSTIN BIEBER S ALBUM BELIEVE ACOUSTIC

Question Answering. CS486 / 686 University of Waterloo Lecture 23: April 1 st, CS486/686 Slides (c) 2014 P. Poupart 1

Automatic Evaluation for Anaphora Resolution in SUPAR system 1

TEXT MINING TECHNIQUES RORY DUTHIE

Information Extraction. CS6200 Information Retrieval (and a sort of advertisement for NLP in the spring)

CS 671 ICT For Development 19 th Sep 2008

HS01: The Grammar of Anaphora: The Study of Anaphora and Ellipsis An Introduction. Winkler /Konietzko WS06/07

ADDIS ABABA UNIVERSITY SCHOOL OF GRADUATE STUDIES. Design of Amharic Anaphora Resolution Model. Temesgen Dawit

Resolving Direct and Indirect Anaphora for Japanese Definite Noun Phrases

AliQAn, Spanish QA System at multilingual

An Introduction to Anaphora

Paninian Grammar Based Hindi Dialogue Anaphora Resolution

Discourse Constraints on Anaphora Ling 614 / Phil 615 Sponsored by the Marshall M. Weinberg Fund for Graduate Seminars in Cognitive Science

Pronominal, temporal and descriptive anaphora

Anaphora Resolution in Biomedical Literature: A Hybrid Approach

Models of Anaphora Processing and the Binding Constraints

TURCOLOGICA. Herausgegeben von Lars Johanson. Band 98. Harrassowitz Verlag Wiesbaden

Prentice Hall Literature: Timeless Voices, Timeless Themes, Bronze Level '2002 Correlated to: Oregon Language Arts Content Standards (Grade 7)

Semantics and Pragmatics of NLP DRT: Constructing LFs and Presuppositions

Prentice Hall Literature: Timeless Voices, Timeless Themes, Silver Level '2002 Correlated to: Oregon Language Arts Content Standards (Grade 8)

807 - TEXT ANALYTICS. Anaphora resolution: the problem

Statistical anaphora resolution in biomedical texts

Anaphora Resolution Exercise: An overview

Natural Language Processing

Palomar & Martnez-Barco the latter being the abbreviating form of the reference to an entity. This paper focuses exclusively on the resolution of anap

ANAPHORA RESOLUTION IN MACHINE TRANSLATION

INFORMATION EXTRACTION AND AD HOC ANAPHORA ANALYSIS

CAS LX 522 Syntax I Fall 2000 November 6, 2000 Paul Hagstrom Week 9: Binding Theory. (8) John likes him.

Ms. Shruti Aggarwal Assistant Professor S.G.G.S.W.U. Fatehgarh Sahib

Long-distance anaphora: comparing Mandarin Chinese with Iron Range English 1

Mapping to the CIDOC CRM Basic Overview. George Bruseker ICS-FORTH CIDOC 2017 Tblisi, Georgia 25/09/2017

The UPV at 2007

Symbolic Logic Prof. Chhanda Chakraborti Department of Humanities and Social Sciences Indian Institute of Technology, Kharagpur

Artificial Intelligence: Valid Arguments and Proof Systems. Prof. Deepak Khemani. Department of Computer Science and Engineering

ZHANG Yan-qiu, CHEN Qiang. Changchun University, Changchun, China

INTRODUCTION TO THE Holman Christian Standard Bible

Visual Analytics Based Authorship Discrimination Using Gaussian Mixture Models and Self Organising Maps: Application on Quran and Hadith

Artificial Intelligence. Clause Form and The Resolution Rule. Prof. Deepak Khemani. Department of Computer Science and Engineering

2007 HSC Notes from the Marking Centre Classical Hebrew

Impact of Anaphora Resolution on Opinion Target Identification

Shaping Statically Resolved Indirect Anaphora for Naturalistic Programming

Natural Language Processing (NLP) 10/30/02 CS470/670 NLP (10/30/02) 1

God the Father. In the. (Genesis 1:1, niv).

DELHI PUBLIC SCHOOL, SRINAGAR

Appendix K. Exegesis for the Translation of the Phrase the Holy Spirit as Antecedent in John 14, 15 and 16

Houghton Mifflin English 2001 Houghton Mifflin Company Grade Three Grade Five

Coordination Problems

KEEP THIS COPY FOR REPRODUCTION Pý:RPCS.15i )OCUMENTATION PAGE 0 ''.1-AC7..<Z C. in;2re PORT DATE JPOTTYPE AND DATES COVERID

Introduction to the Special Issue on Computational Anaphora Resolution

An Efficient Indexing Approach to Find Quranic Symbols in Large Texts

Some questions about Adams conditionals

Tips for Using Logos Bible Software Version 3

Affirmation-Negation: New Perspective

Prentice Hall The American Nation: Beginnings Through 1877 '2002 Correlated to: Chandler USD Social Studies Textbook Evaluation Instrument (Grade 8)

4) When are complex discourse entities constructed in the process of text comprehension?

StoryTown Reading/Language Arts Grade 2

PAGE(S) WHERE TAUGHT (If submission is not text, cite appropriate resource(s))

RECIPIENT ENCODING IN SOUTHERN SELKUP ANJA HARDER, UNIVERSITY OF HAMBURG

ELA CCSS Grade Three. Third Grade Reading Standards for Literature (RL)

Who is "him"? Determining Pronominal Reference

DP: A Detector for Presuppositions in survey questions

Wittgenstein on The Realm of Ineffable

ACD in AP? Richard K. Larson. Stony Brook University

***** [KST : Knowledge Sharing Technology]

Some observations on identity, sameness and comparison

Houghton Mifflin Harcourt Avancemos!, Level correlated to

For what does the scripture say? "Abraham believed God, and it was reckoned to him as righteousness." (NRS)

A Correlation of. To the. Language Arts Florida Standards (LAFS) Grade 3

StoryTown Reading/Language Arts Grade 3

Prentice Hall World Geography: Building A Global Perspective 2003 Correlated to: Colorado Model Content Standards for Geography (Grade 9-12)

SB=Student Book TE=Teacher s Edition WP=Workbook Plus RW=Reteaching Workbook 47

A Computational Model for Resolving Pronominal Anaphora in Turkish Using Hobbs Naïve Algorithm

Houghton Mifflin English 2004 Houghton Mifflin Company Level Four correlated to Tennessee Learning Expectations and Draft Performance Indicators

S.Thennarasu, Dr. R.Prabagaran, L.R.Premkumar, A.Vadivel and R.Amudha LDC-IL, CIIL, Mysore

Punjab University, Chandigarh. Kurukshetra University, Haryana. Assistant Professor. Lecturer

Macmillan/McGraw-Hill SCIENCE: A CLOSER LOOK 2011, Grade 3 Correlated with Common Core State Standards, Grade 3

What would count as Ibn Sīnā (11th century Persia) having first order logic?

Entailment as Plural Modal Anaphora

Table of Contents 1-30

Network Analysis of the Four Gospels and the Catechism of the Catholic Church

Lecture 9. A summary of scientific methods Realism and Anti-realism

A Quranic Quote Verification Algorithm for Verses Authentication

LANGUAGE IN INDIA Strength for Today and Bright Hope for Tomorrow Volume 9 : 1 February 2009 ISSN

Transcription:

International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 609-616 International Research Publications House http://www. irphouse.com /ijict.htm Anaphora Resolution in Hindi Language Priya Lakhmani 1 and Smita Singh 2 1 Department of Computer Science, Banasthali University C-62, Sarojini Marg, C-Scheme, Jaipur, India. Abstract In this paper we present our report on anaphora resolution for Hindi language. Anaphora resolution is a key problem in natural language processing, and has correspondingly received a significant amount of attention in the literature. The primary focus of this work is the resolution of pronominal anaphora means binding of pronoun with their intended noun phrase in the discourse. Though the significant amount of work has been done in English and other European languages, the efficient work, in Hindi language, is lagging far behind. The complete paper is divided into four sections. First section of the paper presents a review of work done in the field of anaphora resolution in Hindi language. In the next section we cover issues related to syntactic and semantic structure of Hindi and influence of cases on pronouns. Further we define constraint sources which will form the base of anaphora resolution task. Finally we perform manual experiment on different kinds of data sets and corresponding results are obtained which shows final accuracy of approx 71%. Keywords: Anaphora, Pronominal Resolution, Case Marker, Natural Language Processing. 1. Introduction Pronominal or anaphora resolution is defined as the problem of determining the noun phrase (NP) that refers to a pronoun in a document. The pointing back word or phrase is called anaphor. The entity to which an anaphor refers or for which it stands is its antecedent. So in simple term anaphora resolution is the process of determining the antecedent of anaphora. In the following sentence, Ram ne shyam ko uski pustak di.

610 Priya Lakhmani & Smita Singh Here uski is a pronoun which refers to the noun Shyam. A human can quickly work out that in above example, the pronoun uski refers to Shyam. The underlying process of how this is done is yet unclear, especially when we encounter more complex sentences: S1: bacchon ne kele khaye kyunki ve bhukhe the. S2: bacchon ne kele khaye kyunki ve pake hue the. In sentence S1 ve refers to bacchon whereas in sentence S2 ve refers to kele. This is an example of pronominal resolution. An important problem in natural language processing is the resolution of pronouns to their intended referents. This is a difficult task to be handled by an anaphora resolution system. Consequently, anaphora resolution presents a challenge, and is an active area of research. The most common type of anaphora is the pronominal anaphora and the major classifications in pronominal are the first, second and third person pronouns. Classification of anaphora and pronoun in Hindi language: Hindi language is a free word order. Pronoun in hindi exhibits a great deal of ambiguity. Pronoun in the first, second, and third person do not convey any information about gender. In Hindi there is no difference between he and she. veh is used for both the gender and is decided by the verb form. With respect to number marking, while some forms, like usko (him), usne (he) are unambiguously singular but some forms can be both singular and plural, like unhone (he)(honorific)/they, or unko (him)(honorific)/ them. The summary of comparison of pronominal anaphora for third person paradigm in English and Hindi: Table 1: Pronominal features in English and Hindi for third person paradigm. Pronominal anaphora in English He He, She (honorific) His, her, its Him, her This That They These Them Their Pronomial anaphora in Hindi veh Ve inhon-ne us usko yeh veh ve unko / unse unka/unki/unke

Anaphora Resolution in Hindi Language 611 Himself Herself Itself Themselves apne swayam khud -aap apne/apni/apnee 2. Related Work In Hindi and other Indian languages anaphora resolution studied are presented by Bharti et al. [1]. Authors designed methods to handle anaphora and implemented in a prototype natural language interface (NLI) for Hindi. Parse structure of sentence has been formed by Panini parser developed at IIT Kanpur. Sobha and Patnaik[2] gave a rule based approach for the resolution of anaphora in Hindi and Malayalam. Anaphora resolution using rule based approach, corpus based studies, and using centering theory are presented in [3]-[4]. Prasad s thesis work is based on the principle that the grammatical function is important for discourse salience in Hindi Language, as in [3]. Dutta et al. [5] presented modified Hobbs algorithm for Hindi. The algorithm takes into account the free word-order and grammatical role in pronoun resolution in Hindi. Authors concluded that in Hindi, the role of subject and object are significant for reflexive and possessive pronouns. Dutta et al. [6] also highlighted the importance of anaphora resolution for machine translation application by evaluating the existing Machine translation systems: AnglaHindi by IIT Kanpur, Matra2 by CDAC Mumbai and Google translation system. The work of [7], [8] studied the application of machine learning algorithms and probabilistic neural network models on the demonstrative pronouns in Hindi. The work conducted so far, in [7], [8]; demonstrate that classification of demonstrative pronouns as direct and indirect anaphora is essential for successful anaphora resolution. The work is conducted on the Emille Corpus. The studies conducted so far, as in [6], demonstrate that, for a successful NLP application the resolution of anaphora is essential. 3. Issues and Challenges Resolving anaphora in hindi is a complex task. There are certain issues which are needed to be considered while performing anaphora resolution. These are mentioned below: Encoding in standard form: Large amount of information is available in Hindi on www (on electronic document form). But this information is encoded in different fonts. That is, there is difficulty in encoding the document in some standard form. Unicode might be a solution to this problem of standardization. Requirement of Unicode based tools for Hindi: The problem with Unicode based font is that Unicode based tools may not support Hindi. This lack of standardization limits the use of these documents in developing corpus. Therefore, neither a single corpus nor a language processing tool is developed

612 Priya Lakhmani & Smita Singh and freely available for research. The tools available are either not up to the mark or limited to some specific domain only. Pleonastic it : Translation of pleonastic it from English to Hindi creates big difficulty. For example, consider the sentence It is raining heavily today It has corresponding translation in Hindi as aaj tei baarish ho rhi hai. Though the corresponding translation of it in Hindi be yeh or veh, in the given example it have no mapping. Therefore it is quite irrelevant to translate this type of it in Hindi target text form English source text. Frequent occurrences of this type of it can cause problem in machine translation[6]. Cases and their influence: Hindi does not differentiate pronouns on gender, its verb that differentiate masculine from feminine gender. Therefore knowledge of verb is also essential for correct pronoun resolution. In Hindi, cases plays very important role in correct translation of some source text in some foreign language to target text in Hindi. The case marker is added separately and the pronoun modifies accordingly. The agreement inflection is marked for person, number, gender. Table 2: Role of verb phrase in gender disambiguation for pronoun veh (he/she). English Sentence Hindi Observation Sentence He is happy veh khush hai No Gender differentiation She is happy veh khush hai No Gender differentiation He was happy veh khush tha Gender is obtained from tha (male), thii(female) She was happy veh khush thii Gender is obtained from tha (male), thii(female) 4. Constraint Resources An experiment based on anaphora resolution has been conducted by us for which these constraint sources forms the base line. Originally only semantic constraint sources were going to be used. However syntactic constraint sources are included because they include some of the most effective techniques relative to their difficulty to implement. The modules available for use are: Recency: A proposal source, recency moves backwards spatially through the text and adds noun phrases to the blackboard as candidates. The confidence score is set on proposal as a float value starting at one and exponentially decreasing to zero as the proposer reaches the beginning of the analyzed text. Gender Agreement: Gender Agreement compares the gender of candidate co referents to the gender required by the pronoun being resolved. Any candidate

Anaphora Resolution in Hindi Language 613 that doesn t match the required gender of the pronoun is removed from further consideration. Number Agreement: Number Agreement extracts the part of speech of candidates. The part of speech label is checked for plurality. If the candidate is plural but the current pronoun being resolved doesn t indicate a plural co referent the candidate is removed from consideration. The same process occurs for singular candidates which are removed if the pronoun being resolved requires a plural co referent. This is an example of a constraint that relies on accurate part of speech tagging in the preprocessor. Animistic Knowledge: Animistic knowledge filters candidates based on which ones represent living beings. Inanimate candidates are removed from consideration when the pronoun being resolved must refer to an animated co referent, and animated candidates are removed from consideration for pronouns that must refer to inanimate co referents. 5. Experiment and Result We have performed two experiments on different types of data sets. This first experiment used text from a children s story. Ideally this experiment represents a baseline performance since the story is a straightforward narrative style with extremely low sentence structure complexity. We have taken short stories in Hindi language from indif.com (http://indif.com/kids/hindi_stories/short_stories.aspx), a popular site for short Hindi stories and perform anaphora approach manually over these stories. Another experiment is conducted on news articles from IBN khabar in Hindi language.(http://khabar.ibnlive.in.com/tag/tag_topic/all/dowry-case.html). It presents an entirely different challenge from the narrative story style. From the experiment following accuracy was observed: Table 3: Result from experiment 1 and 2 performed on short stories. Experiment 1 Experiment 2 Constraints Correctly Anaphora Accuracy Correctly Anaphora Accuracy resolved to resolve resolved 1.Recency 33 77 42.87% 26 52 50.00% 2.Recency, 37 77 48.05% 28 52 53.80% Number agreement 3.Recency, Number, Gender agreement 37 77 48.05% 29 52 55.76%

614 Priya Lakhmani & Smita Singh 4.Recency, Number agreement, Gender agreement, Animistic knowledge 55 77 71.44% 37 52 71.10% The result shows that Number Agreement increased accuracy to 5.18% and Gender Agreement had no contribution to accuracy whereas animistic knowledge increases the accuracy by 23.39% in Experiment 1. The result of Experiment 2 shows that recency provides 50% accuracy which proves that recency is a baseline criteria for anaphora resolution in hindi language. Next, the number agreement and gender agreement shows a little improvement in accuracy. Further, animistic knowledge contributes significantly to overall accuracy thereby increasing it to 71%. 6. Conclusion This paper presents the brief description of anaphora resolution in Hindi language. Hindi language is free word order and hence it has several complications in resolving pronoun in compare to English language. A manual experiment resolving anaphora is performed manually on different data sets. Several constraints are considered which forms the base line of our experiment. The experiment is conducted to determine the contribution of different constraint sources to pronoun resolution on different styles of written text. In future we will try to pair more constraint sources with the writing styles for which they contribute the most to the accuracy of the pronoun resolution system. References [1] A Bharati, Y Krishna Bhargava and R. Sangal (1993), reference and ellipsis in an Indian languages interface to database, computer science and informatics, IIT Hyderabad, 23; 3, pp.60. [2] L. Sobha and B.N. Patnaik (2002), Vasisth: An anaphora resolution system for Malayalam and Hindi, Symposium on Translation Support Systems, [3] R. Prasad (2003), Constraints on the generation of referring expressions: with special reference to Hindi, (Ph.D. thesis), University of Pennsylvania, [4] R. Prasad and M (2000)., Discourse salience and pronoun resolution in Hindi, In Williams, A. & Kaiser, E. (eds.) Penn Working Papers in: Current Linguistics Work in Linguistics, 6, 3, pp.189-208.

Anaphora Resolution in Hindi Language 615 [5] K. Dutta, N. Prakash and S. Kaushik (2008), Resolving Pronominal Anaphora in Hindi using Hobbs algorithm, Web Journal of Formal Computation and Cognitive Linguistics, 10. [6] K. Dutta, N. Prakash and S. Kaushik (2009), Application of Pronominal Divergence and Anaphora Resolution in English-Hindi Machine Translation, Research journal "POLIBITS" Computer Science and Computer Engineering with Applications, 39, pp-55-58. [7] K. Dutta, N. Prakash and S. Kaushik (2010), Probabilistic Neural Network Approach to the Classification of Demonstrative Pronouns for Indirect Anaphora in Hindi, Expert Systems with Applications: An International Journal, 37, 8, pp. 5607-5613, [8] K. Dutta, S. Kaushik and N. Prakash (2011), Machine Learning Approach for the Classification of Demonstrative Pronouns for Indirect Anaphora in Hindi News Items, Prague Bulletin of Mathematical Linguistics. Versita, 95, pp. 33-50.

616 Priya Lakhmani & Smita Singh