Zero Anaphora Resolution in Chinese with Shallow Parsing

Similar documents
Zero Anaphora Resolution in Chinese with Shallow Parsing

Methods for Measuring and Compensating Ball Screw Error on Multi-mode Industrial CT Scanning Platform

Improvements of Indoor Fingerprint Location Algorithm based on RSS

Weihan Wang* Beijing Yuanda International Project Management Consulting Co. Ltd., Beijing , China *Corresponding author

The Great Chain of Being

I Am Special. Lesson at a Glance. God Made Me. Lesson Objectives. Lesson Plan. Bible Story Text. Bible Truth. Lesson 1

Friends of Rochester Cathedral Annual Report

Philip Goes. Lesson at a Glance. Go! Lesson Objectives. Lesson Plan. Bible Story Text. Bible Truth. Lesson 3

Evaluation of geometrical characteristics of Korean pagodas

A Computer Analysis of the Isaiah Authorship Problem

Twenty-Third Publications

A Network Analysis of Hermeneutic Documents Based on Bible Citations

We Go to Church. Lesson at a Glance. Worshiping God. Lesson Objectives. Lesson Plan. Bible Story Text. Bible Truth. Lesson 3

Hannah Talks to God. Lesson Plan

Josiah Loves God s Word

Copyr ight Copyright Tridonic GmbH & Co KG All rights reserved. Manufactur er

TUNIS S NEW MOSQUES CONSTRUCTED BETWEEN 1975 AND 1995: MORPHOLOGICAL KNOWLEDGE

Introduction to the Special Issue on Computational Anaphora Resolution

Brothers and Sisters

Automatic Evaluation for Anaphora Resolution in SUPAR system 1

Towards a more consistent and comprehensive evaluation of anaphora resolution algorithms and systems

Identifying Anaphoric and Non- Anaphoric Noun Phrases to Improve Coreference Resolution

08 Anaphora resolution

Extension of the Upper Extremity with Shoulder Movements

Outline of today s lecture

Anaphora Resolution. Nuno Nobre

Localization Algorithm for Sparse-Anchored WSN in Agriculture

Anaphora Resolution in Hindi Language

i» M < 1 I I MERIT SYSTEMS PROTECTION CHICAGO REGIONAL OFFICE

ANAPHORIC REFERENCE IN JUSTIN BIEBER S ALBUM BELIEVE ACOUSTIC

Anaphora Resolution in Biomedical Literature: A

Design Review Board. John Ellsworth, Environmental Planner on behalf of Verizon Wireless, First Presbyterian Church

Resolving Direct and Indirect Anaphora for Japanese Definite Noun Phrases

Hybrid Approach to Pronominal Anaphora Resolution in English Newspaper Text

A DIGEST OF CHAPTER 14

c The dogs did what they were told so that their masters did not hit them.

5 BY MR. ROSENBLATT: Your Honor. the State would. BY MR. SERMOS: Yes, sir. We'll agree to that. We will release him, too, Your

Processional. a writer s cottage. Alexandria, Virginia, 2017

Performance Analysis of two Anaphora Resolution System for Hindi Language

I i. to read them to you and as you u~derstznd them and read along Kewark Avenue, J. C. ti. J. I 38- Inv. James P.

Dialogue structure as a preference in anaphora resolution systems

v. Theresa Keeping Defendant

I I. I w I T H A L I s T 0 F M E M B E R s. I. i fi Natural Histor~ Societ~ ~ i ~ti~ f. ~ ofthe ~ f~ Pubiished by the Society. 11.

Vision and. Focus Areas. Catholic Schools Youth Ministry Australia CATHOLIC LEADERS FORMATION NETWORK YOUTH MINISTERS INTERNATIONAL JUNIOR AND YOUTH

an imprint of Prometheus Books Amherst, NY

TEXT MINING TECHNIQUES RORY DUTHIE

989 James Robert Todd

History of the Pequot War

Solving position-posture deviation problem of multi-legged walking robots with semi-round rigid feet by closed-loop control

Question Answering. CS486 / 686 University of Waterloo Lecture 23: April 1 st, CS486/686 Slides (c) 2014 P. Poupart 1

.tl",- ' --;'.~~ TOWARD OUR COMMON G OF CORRECT FAITH \ '.~-, ":~~~ A Response to Recent Allegation~':,: :~;..:;~~~ ::f4

AliQAn, Spanish QA System at multilingual

UNSTOPPABLE THEN and NOW A LIFE WELL LIVED Acts 20:17-38

The Ensign. Zarahemla Branch SEPTEMBER Prepare Ye, Prepare Ye

DMITRI IGLITZIN October 22, 2018

Palomar & Martnez-Barco the latter being the abbreviating form of the reference to an entity. This paper focuses exclusively on the resolution of anap

Coreference Resolution Lecture 15: October 30, Reference Resolution

An Introduction to Anaphora

AND. HIST0RY r SONGS' WITH. SKETCHor^AUTHORS. f TUNES* HYMNS

PRACTICAL CHRISTIANITY

Anaphora Resolution Exercise: An overview

An Analysis of Reference in J.K. Rowling s Novel: Harry Potter and the Half-Blood Prince

ANAPHORA RESOLUTION IN HINDI LANGUAGE USING GAZETTEER METHOD

And God is able to make all grace abound to you...

Factivity and Presuppositions David Schueler University of Minnesota, Twin Cities LSA Annual Meeting 2013

The Isser and Rae Price Library of Judaica 30th Anniversary Rededication. March 6, 2011 University of Florida George A. Smathers Libraries.

Anaphora Resolution in Biomedical Literature: A Hybrid Approach

Discourse Constraints on Anaphora Ling 614 / Phil 615 Sponsored by the Marshall M. Weinberg Fund for Graduate Seminars in Cognitive Science

A Machine Learning Approach to Resolve Event Anaphora

Keywords Coreference resolution, anaphora resolution, cataphora, exaphora, annotation.

i = ! i t BOOK OF MORMON J i Is It "The Stick of Ephraim" j i Referred to in the Thirty-seventh Chapter of i BY ELDER JOSEPH LUFF

CAS LX 522 Syntax I Fall 2000 November 6, 2000 Paul Hagstrom Week 9: Binding Theory. (8) John likes him.

SALEM-WITCH-L Archives

Inter Sections. Editorial. An Australian journal for Christian encounter and encouragement

OF FREE "/ILL BAPTISTS. FIFTY- FIFTH ANNUAL MINUTES

This Child Has Been Sent by God

HS01: The Grammar of Anaphora: The Study of Anaphora and Ellipsis An Introduction. Winkler /Konietzko WS06/07

air will make their nests in it.

Israel Journal of Entomology ISRAEL JOURNAL OF ENTOMOLOGY. Vol , Printed in Israel ISSN

Epilogue: Through the Primt of an Intellectual Lif$

AN ANALYSIS OF Mrs. ERIN GRUWELL S SACRIFICES IN FREEDOM WRITERS FILM

ALL-OUT PLEA TO CONGRESS URGED ON FIRE AND POLICE SOCIAL SECURITY BILL

Reference Resolution. Announcements. Last Time. 3/3 first part of the projects Example topics

Section-A (Reading) Bhagat Singh

1. Buber can speak to us about improving our personal relationships

SKYSCRAPER THE ENIGMA OF BUFFINGTON'S. wo questions have persisted in my mind since I

\ rf/7 EVANS, W. A..43HRD INTERVIEW 5043,

治 大. International Master s Program in Asia-Pacific Studies College of Social Sciences National Chengchi University.

Paninian Grammar Based Hindi Dialogue Anaphora Resolution

AJl!l, T X. TEXT--~Ma~t~t~ ~5~:~l~--,/J2.._ TITLE. 1 lette. Sa n Angelo, TX (XXX+++ ) 2L. San Angelo, TX P. M. 9/2/84 FBC /!

OCTOBER 2, Mrfit.ar:hv, london.

DP: A Detector for Presuppositions in survey questions

part three Teaching and Preaching

A Survey on Anaphora Resolution Toolkits

The Iowa Homemaker vol.3, no.7

and Pasturage are the two breasts the State. Sully. P. D. BERNARD, Proprietor.

Reference Resolution. Regina Barzilay. February 23, 2004

Could have done otherwise, action sentences and anaphora

By the Time Viewing relative progress or completion

Anaphora Resolution in Hindi: Issues and Directions

Transcription:

Journal of Chnese Language and Computng (vol. no.)((ssue no.)):(page range) Zero Anaphora Resoluton n Chnese wth Shallow Parsng Chng-Long Yeh and Y-Chun Chen Department of Computer Scence and Engneerng Tatung Unversty 40 Chungshan N. Rd. 3rd. Secton Tape 04 Tawan chngyeh@cse.ttu.edu.tw d8806005@mal.ttu.edu.tw Submtted on Revsed and Accepted on 25 May, 2004 Abstract Most tradtonal approaches to anaphora resoluton are based on the ntegraton of complex lngustc nformaton and doman knowledge. However, the constructon of a doman knowledge base s very labor-ntensve and tme-consumng. In ths paper, we work on the output of a part-of-speech tagger and use shallow parsng nstead of complex parsng to resolve zero anaphors n wrtten Chnese. We employ centerng theory and constrant rules to dentfy the antecedents of zero anaphors as they appear n the precedng utterances. We focus on the cases of zero anaphors that occur n the topc or subject, and object postons of utterances. The expermental result shows that the precson rates of zero anaphora detecton and the recall rate of zero anaphora resoluton wth the method are 8% and 70% respectvely. Keywords Anaphora Resoluton, Zero Anaphora Detecton, Antecedent Identfcaton, Shallow Parsng, Centerng Theory. Introducton In natural languages, expressons that can be deduced contextually by the reader are frequently omtted n texts. Ths s especally the case n Chnese, where a knd of anaphorc expresson s frequently elmnated. Ths wll be termed zero anaphor (ZA) hereafter, due to ts promnence n dscourse (L and Thompson 98). The omsson may

Chng-Long Yeh and Y-Chun Chen cause consderable problems n natural language processng systems. For example n a machne translaton system, a Chnese text can not be translated properly nto text n a target language wthout dentfyng the meanng of the omtted expressons frst. In nformaton extracton, the events related to some subjects omtted n texts can not be extracted effectvely. In ths paper, we am at the resoluton of zero anaphora n Chnese text. An approach of anaphora resoluton employs knowledge sources or factors, for example, gender and number agreement, c-command constrants, semantc nformaton to dscount unlkely canddates untl a mnmal set of plausble canddates s obtaned (Grosz et al. 995; Lappn and Leass 994; Okumura and Tamura 996; Walker et al. 998; Yeh and Chen 200). Anaphorc relatons between anaphors and ther antecedents are dentfed based on the ntegraton of lngustc and doman knowledge. However, t s very labor-ntensve and tme-consumng to construct grammatcal and doman knowledge base. Another approach employs statstcal models or AI technques, such as machne learnng, to compute the most lkely canddate (Aone and Bennett 995; Connoly et al. 994; Ge et al. 998; Sek et al. 2002). Ths approach can sort out the above problems. However, t heavly reles upon the avalablty of suffcently large text corpora that are tagged, n partcular, wth referental nformaton (Stuckardt 2002). A recent approach s the search for nexpensve, fast and relable procedures of anaphora resoluton (Baldwn 997; Ferrández et al. 998; Kennedy and Boguraev 996; Mtkov 998). The approach reles on relable and cheaper NLP tools such as part-of-speech (POS) tagger and shallow parsers. In ths paper, we adopt ths approach. The task of ZA resoluton can be dvded nto two phases: frst detectng the occurrences of zero anaphors n text, and then fndng ther antecedents n the dscourse. A POS tagger and the followng shallow parser are used to accomplsh the task of the frst phase. We then employ the centerng theory (Grosz et al. 995) to develop a rule-base as the bass to determne the antecedents of zero anaphors found n the frst phase. We have carred out an experment usng a number of news artcles as the test data. The result shows that the precson rate of zero anaphora detecton s 80% and wthn the detected zero anaphors, 70% can be resolved correctly. In the followng sectons we frst brefly descrbe the nature of zero anaphora n Chnese. In Secton 3 we descrbe n detals the shallow parsng method. In Secton 4 we descrbe the ZA resoluton method. In Secton 5, we show the experments and result. Fnally our conclusons are summarzed, and future works are suggested. 2. Zero Anaphora n Chnese As mentoned n Secton, zero anaphors are generally noun phrases that are understood from the context and do not need to be specfed. For example n (), the topc of the utterance (a) s Zhangsan whch s elmnated n the second utterance. () a. Zhangsan jnghuang de wang wa pao Zhangsan frghtened CSC towards outsde run Zhangsan frghtened and ran outsde.

Zero Anaphora Resoluton n Chnese wth Shallow Parsng b. c. d. φ j zhuangdao y ge ren (he) bump-to a person (He) bumped nto a person. 2 j ta kanqng le na ren de zhangxang he see-clear ASPECT that person GEN appearance He saw clearly that person s appearance. φ j 3 renchu na ren sh she (he) recognse that person s who (He) recognzed who that man s. In addton to zero anaphors, anaphors can be pronomnal and nomnal forms, as exemplfed by He and that person n (c) and (d), respectvely (Chen 987). Accordng to L and Thompson (L and Thompson 98), zero anaphors can be classfed as ntrasentental or ntersentental. In the ntrasentental case, the antecedent exsts n the same sentence, or the zero anaphor can be understood and does not need to be expressed, such as the n (2) whle antecedent and anaphors are located n dfferent sentences n the ntersentental case, such as the and n (b) and (d). (2) Zhangsan canja bsa yngde y ta dannao Zhangsan enter competton (he) wn a CL computer Zhangsan entered a competton and (he) wn a computer. In the ntersentental case, antecedent and anaphors are located n dfferent sentences. Dependng upon the dstance between the sentences contanng antecedent and anaphor, t can further be dvded nto two types: mmedate and long dstance. The former s where the sentence contanng the antecedent s mmedately followed by the one contanng the j k anaphor, such as φ n (3b) and φ n (3d). On the other hand, for the long dstance type, the sentence contanng the antecedent and anaphors,, are not n mmedately succeedng order, such as φ n (3e). (3) a. j pangxe you s du buzu crab have four-par walkng-foot A crab has four pars of feet. b We use a φ to denote a zero anaphor, where the subscrpt a s the ndex of the zero anaphor a tself and the superscrpt b s the ndex of the referent. A sngle wthout any scrpt represents an ntrasentental zero anaphor. Also note that a superscrpt attached to an NP s used to represent the ndex of the referent.

Chng-Long Yeh and Y-Chun Chen j b. φ sucheng tuer (they) common-called "tuer" (They) are commonly called "tuer." c. k youyu me tao tuer de guanje zhneng xang xa wanqu snce every "tuer" ASSOC jont only can towards down bend Snce every "tuer"'s jont can only bend downwards, k d. φ buneng xang qanhou wanqu (t) not can towards forward-backward bend (t) can't bend backward or forwards. e. φ paxng sh (t) crawl ASPECT (When) (t) crawls, f. φ 2 bxu xan yong y ban buzu de zhjan zhua d (t) must frst use one-sde walkng-foot ASSOC fngertp grasp-on ground (t) must use the tps of feet on one sde to grasp the ground. g. φ h. 3 za yong lng y ban de buzu zhshen qla (t) then use another one-sde ASSOC walkng-foot straght-rse upwards (It) then uses the feet on the other sde to move upwards. φ 4 ba shent tu guoqu (t) BA (t) body push get-through (It) pushes (ts) body towards one sde. 3. Sentence Parsng Full parsng s used to provde an as detaled as possble analyss of the sentence structure and to buld a complete parse tree for the sentence, whle shallow parsng s lmted to parsng smaller consttuents such as noun phrases or verb phrases (Abney 996; L and Roth 200). In ths secton, we show you some examples of full parsng and descrbe our method of shallow parsng n Chnese. 3. Full Parsng Many tradtonal approaches to parsng natural language sentences am to recover complete, exact parses based on the ntegraton of complex syntactc and semantc nformaton. They

Zero Anaphora Resoluton n Chnese wth Shallow Parsng search through the entre space of parses defned by the grammar and then seek the globally best parse referrng to some heurstc rules or manual correcton. For example, the sentence (4) taken from Snca Treebank (Snca Treebank 2002) s annotated as below. (4) ta zhongyu zhaodao y fen gongzuo le he fnal fnd a CL job ASPECT He fnally found a job. S(agent:NP(Head:Nhaa: ) tme:dd: Head:VC2: goal:np(quantfer: DM: Head:Nac: ) partcle:ta: ) S(agent:NP(Head:Nhaa:he) tme:dd:fnally Head:VC2:fnd goal:np(quantfer: DM:a Head:Nac:job) partcle:ta:le) The sentence structure n Snca Treebank s represented by employng head-drven prncple, that s, each sentence or phrase has a head leadng t. A phrase conssts of a head, arguments and adjuncts. One can use the concept of head to fgure out the relatonshp among the phrases n a sentence. In the example (4), the head of the NP (noun phrase), he, s the agent of the verb, fnd. Although the head-drven prncple may prevent the ambguty of syntactcal analyss (Chen et al. 999), to choose the head of a phrase automatcally may cause errors. Another example (5) s extracted from the Penn Chnese TreeBank (The Penn Chnese Treebank Project 2000). (5) Zhangsan gaosu Ls Wangwu la le Zhangsan tell Ls Wangwu come ASPECT Zhangsan told Ls that Wangwu has come. (IP (NP-PN-SBJ (NR )) (VP (VV ) (NP-PN-OBJ (NR )) (IP (NP-PN-SBJ (NR )) (VP (VV ) (AS ))))) (IP (NP-PN-SBJ (NR Zhangsan)) (VP (VV tell) (NP-PN-OBJ (NR Ls)) (IP (NP-PN-SBJ (NR Wangwu)) (VP (VV come) (AS le)))))) The Penn Chnese TreeBank provdes sold lngustc analyss for the selected text, based on the current research n Chnese syntax and the lngustc expertse of those nvolved n the Penn Chnese Treebank project to annotate the text manually. 3.2 Shallow Parsng

Chng-Long Yeh and Y-Chun Chen Shallow (or partal) parsng whch s an nexpensve, fast and relable method does not delver full syntactc analyss but s lmted to parsng smaller syntactcal related consttuents (Abney 99; Abney 996; L and Roth 200; Mtkov 999). For example, the sentence (6a) and can be dvded as (6b): (6) a. Hualan chengwe remen de luyou ddan Hualan become popular NOM tour place Hualen became the popular tourst attracton. b. [NP ] [VP ] [NP ] [NP Hualen ] [VP became] [NP the popular tourst attracton] Gven a Chnese sentence, our method of shallow parsng s dvded nto the followng steps: Frst the sentence s dvded nto a sequence of POS-tagged words by employng a segmentaton program, AUTOTAG, whch s a POS tagger developed by CKIP, Academa Snca (CKIP 999). Second the sequence of words s parsed nto smaller consttuents such as noun phrases and verb phrases wth phrase-level parsng. Each phrase s represented as a word lst. Then the sequence of word lsts s transformed nto trples, [S,P,O]. For example n (7), (7b) s the output of sentence (7a) produced by AUTOTAG and (7c) s the trple representaton. (7) a. [ (Nc) (VG) (VH) (DE) (VA) (Na)] b. [[ ], np], [[ ], vp], [[,,, ], np] c. [[ ], [ ], [,,, ]] The defnton of trple representaton s llustrated n Defnton.The trple here s a smple representaton whch conssts of three elements: S, P and O whch correspond to the Subject (noun phrase), Predcate (verb phrase) and Object (noun phrase) respectvely n a clause. Defnton : A Trple T s characterzed by a 3-tuple: T = [S, P, O] where S s a lst of nouns whose grammatcal role s the subject of a clause. P s a lst of verbs or a preposton whose grammatcal role s the predcate of a clause. O s a lst of nouns whose grammatcal role s the object of a clause. In the step of trple transformaton, the sequence of word lsts as shown n (7b) s transformed nto trples by employng the Trple Rules. The Trple Rules s bult by referrng to the Chnese syntax. There are four knds of Trples n the Trple Rules, whch corresponds to fve basc clauses: subject + transtve verb + object, subject + ntranstve verb, subject + preposton + object, and a noun phrase only. The rules lsted below are employed n order: Trple Rules: Trple(S,P,O) np(s), vtp(p), np(o). Trple2(S,P,none) np(s), vp(p).

Zero Anaphora Resoluton n Chnese wth Shallow Parsng Trple3(S,P,O) np(s), prep(p), np(o). Trple4(S,none,none) np(s). The vtp(p) denotes that the predcate s a transtve verb phrase, whch contans a transtve verb n the rghtmost poston n the phrase; lkewse the vp(p) denotes that the predcate s an ntranstve verb phrase, whch contans an ntranstve verb n the rghtmost poston n the phrase. In the rule Trple3, the prep(p) denotes that the predcate s a preposton. The Trple4 s employed only f a sentence contans only one noun phrase and no other consttuent. If all the rules n the Trple Rules faled, the ZA Trple Rules are employed to detect zero anaphor (ZA) canddates. ZA Trple Rules: Trplez(zero,P,O) vtp(p), np(o). Trplez2(S,P,zero) np(s), vtp(p). Trplez3(zero,P,zero) vtp(p). Trple2z (zero,p,none) vp(p). Trple3z(zero,P,O) prep(p), np(o). Trple4z(zero,P,O) co-conj(p), np(o). The zero anaphora n Chnese generally occurs n the topc, subject or object poston. The rules Trplez, Trple2z, and Trple3z detect the zero anaphora occurrng n the topc or subject poston. The rule Trplez2 detects the zero anaphora n the object poston and Trplez3 detect the zero anaphora occurrng n both subject and object postons. In the Trple4, the co-conj(p) denotes a coordnatng conjuncton appearng n the ntal poston of a clause. For example n (8), there are two trples generated. In the second trple, zero denotes a zero anaphor accordng to Trplez. (8) Zhangsan canja bsa yngde guanjun Zhangsan enter competton wn champon Zhangsan entered a competton and won the champon. [[[ ], [ ], [ ]], [[zero], [ ], [ ]]] [[[Zhangsan], [enter], [competton]], [[zero], [wn], [champon]]] The Fgure llustrates the detaled procedure of Trple transformaton. The nput s a sequence of word lsts after phrase-level parsng. The nput sequence s scanned from the leftmost word lst n the sequence and the Trple Rules are employed to generate a new Trple. If a new Trple s generated, the remanng sub-sequence s taken as a new nput, or the ZA Trple Rules s employed to generate a new Trple. If no other word lst s left to be processed, the procedure stops, or otherwse, the procedure contnues to process the remanng sub-sequence.

Chng-Long Yeh and Y-Chun Chen A sequence of word lsts Scan from the leftmost word lst to the rghtmost one n the nput sequence Employ Trple Rules to generate a new Trple Is a new Trple generated? Y Take the remanng sub-sequence as new nput N Employ ZA Trple Rules to generate a new Trple Y Does the remanng sub-sequence exst? N Output the result and stop Fgure : The procedure of Trple transformaton 4. ZA Resoluton Method The ZA resoluton method we develop s dvded nto three parts. Frst each sentence of an nput document s translated nto trples as descrbed n Secton 3. Second, ZA dentfcaton verfes that each ZA canddate s annotated n trples by employng ZA dentfcaton constrants. Thrd antecedent dentfcaton dentfes the antecedent of each detected ZA by usng rules based on the centerng theory. 4. Centerng Theory In the centerng theory (Grosz et al. 995; Walker et al. 994; Strube and Hahn 996), each utterance U n a dscourse segment has two structures assocated wth t, they are called forward-lookng centers, C f (U) and backward-lookng centers, C b (U). The forward-lookng centers of U n, C f (U n ), depend only on the expressons that consttute that utterance. They

Zero Anaphora Resoluton n Chnese wth Shallow Parsng are not constraned by features of any prevous utterance n the dscourse segment (DS), and the elements of C f (U n ) are partally ordered to reflect relatve promnence n U n. Grosz et al., n ther paper (Grosz et al. 995), assume that grammatcal roles are the major determnant for rankng the forward-lookng centers, wth the order Subject > Object(s) > Others. The superlatve element of C f (U n ) may become the C b of the followng utterance, C b (U n+ ).. In addton to the structures for centers, C b, and C f, the centerng theory specfes a set of constrants and rules (Grosz et al. 995; Walker et al. 994). Constrants For each utterance U n a dscourse segment U,, U m :. U has exactly one C b. 2. Every element of C f (U ) must be realzed n U. 3. Rankng of elements n C f (U ) gudes determnaton of C b (U + ). 4. The choce of C b (U ) s from C f (U - ), and can not be from C f (U -2 ) or other pror sets of C f. Backward-lookng centers, C b s, are often omtted or pronomnalzed. Dscourses that contnue centerng the same entty are more coherent than those that shft from one center to another. Ths means that some transtons are preferred over others. These observatons are encapsulated n two rules: Rules For each utterance U n a dscourse segment U,, U m : I. If any element of C f (U ) s realzed by a pronoun n U + then the C b (U + ) must be realzed by a pronoun also. II. Sequences of contnuaton are preferred over sequence of retanng; and sequences of retanng are to be preferred over sequences of shftng. Rule I represents one functon of pronomnal reference: the use of a pronoun to realze the C b sgnals the hearer that the speaker s contnung to talk about the same thng. Psychologcal research and cross-lngustc research have valdated that the C b s preferentally realzed by a pronoun n Englsh and by equvalent forms (.e. zero anaphora) n other languages (Grosz et al. 995). Rule II reflect the ntuton that contnuaton of the center and the use of retentons when possble to produce smooth transtons to a new center provde a bass for local coherence. For example n (9), the subject of the utterance (9b) s elmnated, and ts antecedent s dentfed as the subject of the precedng utterance (9a) accordng to the centerng theory. (9) a. danzgu shou meguo gaokejgu zhongcuo yngxang Electroncs stock receve USA hgh-tech stock heavy-fall affect Electroncs stocks were affected by hgh-tech stocks n USA. b. chxu xade (Electroncs stocks) contnue fall (Electroncs stocks) contnued fallng down.

Chng-Long Yeh and Y-Chun Chen 4.2 Zero Anaphora Resoluton The process of analyzng Chnese zero anaphora s dfferent from general pronoun resoluton n Englsh because zero anaphors are not expressed n dscourse. The task of ZA resolutons s dvded nto two phases: frst ZA detecton and then antecedent dentfcaton. In ths paper, we focus on the cases of ZA occurrng n the topc or subject, and object postons. In the ZA detecton phase, we use the ZA Trple Rules descrbed n 3.2 to detect omtted cases as ZA canddates denoted by zero n trples. Table shows some examples correspondng to the ZA Trple Rules. ZA Trple Rule Trple z (zero,p,o) Trple z2 (S,P,zero) Trple z3 (zero,p,zero) Trple2 z (zero,p,none) Trple3 z (zero,p,o) Trple4 z (zero,p,o) Example (b) zhuangdao y ge ren (he) bump-to a person (He) bumped nto a person. Zhangsan xhuan ma Zhangsan lke (somebody or somethng) Q Does Zhangsan lke (somebody or somethng)? xhuan (he) lke (somebody or somethng) (He) lkes (somebody or somethng). qu gouwu le (he) go shoppng ASPECT (He) has gone shoppng. za naban (he) n there (He) s there. gen xaopengyou wan (he) wth chld play (He) s playng wth lttle chldren. Table : Examples of zero anaphora After ZA canddates are detected by employng the ZA Trple Rules, the ZA dentfcaton constrants are utlzed to flter out non-anaphorc cases. In the ZA dentfcaton constrants, the constrant s employed to exclude the exophora 2 or cataphora 3 whch s dfferent from anaphora n texts. The constrant 2 ncludes some cases 2 Exophora s reference of an expresson drectly to an extralngustc referent n whch the referent does not requre another expresson for ts nterpretaton. 3 Cataphora arses when a reference s made to an entty mentoned subsequently.

Zero Anaphora Resoluton n Chnese wth Shallow Parsng mght be ncorrectly detected as zero anaphors, such as passve sentences or nverted sentences (Hu 995). ZA dentfcaton constrants For each ZA canddate c n a dscourse:. c can not be n the frst utterance n a dscourse segment 2. ZA does not occur n the followng case: NP + be + NP + VP + c NP (topc) + NP (subject) + VP + c In the antecedent dentfcaton phase, we employ the backward-lookng center of centerng theory to dentfy the antecedent of each ZA. Frst we use noun phrase rules to obtan noun phrases n each utterance, and then the antecedent s dentfed as the most promnent noun phrase of the precedng utterance (Yeh and Chen 200): Antecedent dentfcaton rule: For each zero anaphor z n a dscourse segment U,, U m : If z occurs n U, and no zero anaphor occurs n U - then choose the noun phrase wth the correspondng grammatcal role n U - as the antecedent Else f only one zero anaphor occurs n U - then choose the antecedent of the zero anaphor n U - as the antecedent of z Else f more than one zero anaphor occurs n U - then choose the antecedent of the zero anaphor n U - as the antecedent of z accordng to grammatcal role crtera: Topc > Subject > Object > Others End f Due to topc-promnence n Chnese (L and Thompson 98), topc s the most salent grammatcal role. In general, f the topc s omtted, the subject wll be n the ntal poston of an utterance. If the topc and subject are omtted concurrently, the ZA occurs. The antecedent dentfcaton rule corresponds to the concept of centerng theory. 5. Experment and Result In ths secton we descrbe the experment and result of the two-phase zero anaphora resoluton descrbed n the precedng secton. In the ZA detecton phase, we only take the result of employng the ZA Trple Rules as the baselne at frst, and then nclude ZA dentfcaton constrants to see the dfference. In the antecedent dentfcaton phase, we also use a rule wthout nvolvng the centerng theory to pt our method aganst to show mprovement. The test corpus s a collecton of 50 news artcles contaned 998 paragraphs, 463 utterances, and 40884 Chnese words. 5. ZA Detecton

Chng-Long Yeh and Y-Chun Chen By employng the ZA Trple Rules and ZA dentfcaton constrants mentoned prevously, zero anaphors occur n topc or subject, and object postons can be detected. In the experment, we frst only employ the ZA Trple Rules, and then nclude the ZA dentfcaton constrants to see the mprovement. Because the ZA Trple Rules cover each possble topc or subject, and object omsson cases, the result shows that the zero anaphors are over detected. The Table shows the precson rates calculated usng equaton 2. No. of ZA correctly detected Precson rate of ZA detecton =... () No. of ZA canddates The man errors of ZA detecton occur n the experment when parsng nverted sentences and non-anaphorc cases (e.g. exophora or cataphora) (Mtkov 2002; Hu 995). Cataphora s smlar to anaphora, the dfference beng the drecton of the reference. In ths paper, we do not deal wth the case that the referent of a zero anaphor s n the followng utterances, but we can detect about 60% cataphora n the test corpus by employng ZA dentfcaton constrant. 5.2 Antecedent Identfcaton In ths phase, we take the output of employng the ZA Trple Rules and ZA dentfcaton constrants, and further to dentfy the antecedents of zero anaphors. We frst use a smple antecedent dentfcaton rule wthout nvolvng the centerng theory and then employ the antecedent dentfcaton rule mentoned n 4.2 to show the mprovement: Smple Antecedent dentfcaton rule: For each zero anaphor z n a dscourse segment U,, U m : If z occurs n U then choose the noun phrase n U - havng the longest dstance from z as the antecedent. The smple antecedent dentfcaton rule does not consder the rankng of centers n the centerng theory (Grosz et al. 995). By comparng wth the smple antecedent dentfcaton rule, the antecedent dentfcaton rule based on the centerng theory (see 4.2) determnes the antecedents accordng to grammatcal role crtera. For example, n the dscourse segment (0), the zero anaphors are detected n the utterances (0b) and (0c). Accordng to the antecedent dentfcaton rule, the noun phrase, Kee-lung General Hosptal, whose grammatcal role corresponds to the zero anaphor φ n (0b) s dentfed as the antecedent. Subsequently, the antecedent of the zero anaphor φ 2 n (0c) s dentfed as the antecedent of φ n (0b),. (0) a. Jlong yyuan we kuoda fuwu fanwe Kee-lung hosptal for expand servce coverage Kee-lung General Hosptal ams to expand servce coverage. b. φ jj tsheng ylao fuwu pnzh j baozhunhua (t) actve mprove medcal-treatment servce qualty and standardzaton

Zero Anaphora Resoluton n Chnese wth Shallow Parsng (It) actvely mproves the servce qualty of medcal treatment and standardzaton. c. φ 2 huo weshengshu renke we banl walao tjan yyuan (t) obtan Department-of-Health certfy to-be handle foregn-laborer physcal-examnaton hosptal (It) s certfed by Department of Health as a hosptal whch can handle physcal examnatons of foregn laborers. Table 3 shows the recall rates and precson rates of ZA resoluton calculated usng equaton 2 and equaton 3. Errors occur n the phase when a zero anaphor refers to an entty other than the correspondng grammatcal role or the antecedent of the zero anaphor n the precedng utterance. rate of ZA resoluton No.of antecedentcorrectly dentfed No.of ZA canddates rate of ZA detecton No.of antecedentcorrectly dentfed No.of ZA occurredn text Precson =...(2) Recall =...(3) Cases ZA Trple rules + ZA Trple rules ZAs constrants No. of ZAs 226 226 ZA Canddates 3400 2754 Precson Rate 65.2% 80.5% Table 2: Results of ZA detecton Cases smple antecedent employ centerng Accuracy dentfcaton rule theory Recall Rate 65.8% 70% Precson Rate 55.3% 60.3% Table 3: Results of ZA resoluton 6. Conclusons In ths paper, we develop an nexpensve method of Chnese ZA resoluton that works on the output of a part-of-speech tagger and uses a shallow parsng nstead of a complex parsng to resolve zero anaphors n Chnese texts. In our prelmnary experment, we deal wth the cases of topc or subject, and object omsson. The precson rate of ZA detecton s 8% and the recall rate of ZA resoluton s 70%. The errors of ZA resoluton are n the followng cases:. Out of the grammatcal role crtera (rankng of forward-lookng centers): When a ZA refers to an entty other than the correspondng grammatcal role or the antecedent of the zero anaphor n the precedng utterance. 2. Out of local coherence: The antecedent of a ZA s mentoned n more prevous utterances. 3. Cataphora: When a ZA refers to an antecedent mentoned n the succeedng utterances.

Chng-Long Yeh and Y-Chun Chen 4. Other non-anaphorc cases: Dependng on the background knowledge of readers, the referent of a ZA does not requre expresson n the text. In case 3 and 4, we do not tend to treat non-anaphorc cases n ths paper, but we can detect about 60% cataphora and exophora and 50% nverted sentences n the test corpus by employng ZA dentfcaton constrants. We have performed the method and experment on ZA resoluton n the prevous sectons. The result s promsng to some extent; however, there are stll some problems that need further nvestgaton, such as pronoun resoluton and the applcatons of ZA resoluton. In the task of pronoun resoluton, because the pronomnal anaphors are expressed n dscourse, the detecton rules are unnecessary to the task of pronoun resoluton. We may modfy the antecedent dentfcaton rule mentoned n 3.3 to dentfy the antecedents of pronomnal anaphors occurrng n utterances and some anaphora resoluton factors can be used, such as gender and number agreement (Lappn and Leass 994). Another lne of research to be undertaken n the future s the enhancement of the shallow parsng technque we used n ths paper. For example, one mght enhance the output of text chunkng, wthout analyzng each phrase structure n an utterance but by dvdng each clause wthn an utterance nto syntactcally correlated parts of words. We would also further extend our approach to dealng wth other omsson cases, such as verb omsson and conduct more experments on texts from other domans. Acknowledgement We gve our specal thanks to CKIP, Academa Snca for makng great efforts n computatonal lngustcs and sharng the Autotag program to academc research. References Abney, Steven, 99, Parsng by chunks, In Robert Berwck, Steven Abney, and Carol Tenny, edtors, Prncple-Based Parsng, Kluwer Academc Publshers. Abney, Steven, 996, Taggng and Partal Parsng, In: Ken Church, Steve Young, and Gerrt Bloothooft (eds.), Corpus-Based Methods n Language and Speech, An ELSNET volume, Kluwer Academc Publshers, Dordrecht. Aone, Chnatsu and Bennett, Scott Wllam, 995, Evaluatng automated and manual acquston of anaphora resoluton strateges, Proceedngs of the 33rd Annual Meetng of the ACL, Santa Cruz, New Mexco, pages 22 29. Baldwn, Breck, 997, CogNIAC: hgh precson coreference wth lmted knowledge and lngustc resources, ACL/EACL workshop on Operatonal factors n practcal, robust anaphor resoluton. Chen, F.-Y., Tsa, P.-F., Chen, K.-J. and Huang, C.-R., 999, Snca Treebank, Computatonal Lngustcs and Chnese Language Processng (CLCLP), 4(2): 87-04. Chen, P, 987, Hanyu lngxn huzh de huayu fenx (a dscourse approach to zero anaphora n chnese) (n chnese), Zhongguo Yuwen (Chnese Lngustcs), pages 363-378.

Zero Anaphora Resoluton n Chnese wth Shallow Parsng CKIP, 999, Verson.0 (Autotag), http://godel.s.snca.edu.tw /CKIP/, Academa Snca. Connoly, Denns, Burger, John D. and Day, Davd S., 994, A Machne learnng approach to anaphorc reference, Proceedngs of the Internatonal Conference on New Methods n Language Processng, 255-26, Manchester, Unted Kngdom. Ferrández, A., Palomar, Manuel and Moreno, Lda, 998, Anaphor Resoluton n Unrestrcted Texts wth Partal Parsng, Proceedngs of the 8th Internatonal Conference on Computatonal Lngustcs (COLING'98)/ACL'98 Conference, pages 385-39. Montreal, Canada. Gazdar, G. and Mellsh, C., 989, Natural Language Processng n PROLOG An Introducton to Computatonal Lngustcs, Addson- Wesley. Ge, Nyu, Hale, John and Charnak, Eugene, 998, A statstcal approach to anaphora resoluton, Proceedngs of the Sxth Workshop on Very Large Corpora, pages 6 70 Grosz, B. J. and Sdner, C. L., 986, Attenton, ntentons, and the structure of dscourse, Computatonal Lngustcs, No 3 Vol 2, pp. 75-204. Grosz, B. J., Josh, A. K. and Wensten, S., 995, Centerng: A Framework for Modelng the Local Coherence of Dscourse, Computatonal Lngustcs, 2(2), pp. 203-225. Hu, Wenze, 995, Functonal Perspectves and Chnese Word Order, Ph. D. dssertaton, The Oho State Unversty. Kennedy, Chrstopher and Boguraev, Branmr, 996, Anaphora for everyone: pronomnal anaphora resoluton wthout a parser, Proceedngs of the 6th Internatonal Conference on Computatonal Lngustcs (COLING'96), 3-8. Copenhagen, Denmark. Lappn, S. and Leass, H., 994, An algorthm for pronomnal anaphor resoluton, Computatonal Lngustcs, 20(4). L, Charles N. and Thompson, Sandra A., 98, Mandarn Chnese A Functonal Reference Grammar, Unversty of Calforna Press. L, X. and Roth, D., 200, Explorng Evdence for Shallow Parsng, Proceedngs of Workshop on Computatonal Natural Language Learnng, Toulouse, France. Mtkov, Ruslan, 998, Robust pronoun resoluton wth lmted knowledge, Proceedngs of the 8th Internatonal Conference on Computatonal Lngustcs (COLING'98)/ACL'98 Conference. Montreal, Canada. Mtkov, Ruslan, 999, Anaphora resoluton: the state of the art, Workng paper (Based on the COLING'98/ACL'98 tutoral on anaphora resoluton), Unversty of Wolverhampton, Wolverhampton. Mtkov, Ruslan, 2002, Anaphora Resoluton, Longman. Okumura, Manabu and Tamura, Kouj, 996, Zero pronoun resoluton n Japanese dscourse based on centerng theory, Proceedngs of the 6th Internatonal Conference on Computatonal Lngustcs (COLING-96), 87-876. Sek, Kazuhro, Fuj, Atsush, and Ishkawa, Tetsuya, 2002, A Probablstc Method for Analyzng Japanese Anaphora Integratng Zero Pronoun Detecton and Resoluton, Proceedngs of the 9th Internatonal Conference on Computatonal Lngustcs (COLING 2002), pp.9-97. Sdner, C. L., 979, Toward a Computatonal Theory of Defnte Anaphora Comprehenson n Englsh Dscourse, Ph.D. thess, MIT. Sdner, C. L., 983, Focusng n the comprehenson of defnte anaphora, Computatonal Models of Dscourse, MIT Press. Snca Treebank, 2002, URL http://turng.s.snca.edu.tw/treesearch/, Academa Snca. Strube, M. and Hahn, U., 996, Functonal Centerng, Proceedngs Of ACL 96, Santa Cruz, Ca., pp.270-277.

Chng-Long Yeh and Y-Chun Chen Stuckardt, Roland, 2002, Machne-Learnng-Based vs. Manually Desgned Approaches to Anaphor Resoluton: the Best of Two Worlds, Proceedngs of the 4th Dscourse Anaphora and Anaphor Resoluton Colloquum (DAARC2002), Unversty of Lsbon, Portugal, pages 2-26. The Penn Chnese Treebank Project, 2000, URL http://www.cs.upenn.edu/~chnese/. Lngustc Data Consortum, Unversty of Pennsylvana. Walker, M. A., 989, Evaluatng Dscourse Processng Algorthms, Proceedngs Of ACL 89, Vancouver, Canada. Walker, M. A., 998, Centerng, anaphora resoluton, and dscourse structure. In Walker, M. A., Josh, A. K. and Prnce, E. F., edtors, Centerng n Dscourse, Oxford Unversty Press. Walker, M. A., Ida, M. and Cote. S., 994, Japan Dscourse and the Process of Centerng, Computatonal Lngustcs, 20(2): 93-233. Yeh, Chng-Long and Chen, Y-Chun, 200, An emprcal study of zero anaphora resoluton n Chnese based on centerng theory, Proceedngs of ROCLING XIV, Tanan, Tawan. Yeh, Chng-Long and Chen, Y-Chun, 2003, Usng Zero Anaphora Resoluton to Improve Text Categorzaton, Proceedngs of PACLIC 7, Sentosa, Sngapore.

Zero Anaphora Resoluton n Chnese wth Shallow Parsng Appendx: Abbrevatons In the word-by-word translaton, some markers are abbrevated as below. We follow the abbrevatons used n []. Abbrevaton ASSOC ASPECT BA BEI CL CSC GEN NOM Q Term assocatve (de) aspect marker ba be classfer complex statve constructon (de) gentve (de) nomnalzer (de) Queston (ma)