Zero Anaphora Resolution in Chinese with Shallow Parsing

Similar documents
Zero Anaphora Resolution in Chinese with Shallow Parsing

Methods for Measuring and Compensating Ball Screw Error on Multi-mode Industrial CT Scanning Platform

Improvements of Indoor Fingerprint Location Algorithm based on RSS

Weihan Wang* Beijing Yuanda International Project Management Consulting Co. Ltd., Beijing , China *Corresponding author

The Great Chain of Being

I Am Special. Lesson at a Glance. God Made Me. Lesson Objectives. Lesson Plan. Bible Story Text. Bible Truth. Lesson 1

Friends of Rochester Cathedral Annual Report

Philip Goes. Lesson at a Glance. Go! Lesson Objectives. Lesson Plan. Bible Story Text. Bible Truth. Lesson 3

Evaluation of geometrical characteristics of Korean pagodas

A Computer Analysis of the Isaiah Authorship Problem

Twenty-Third Publications

A Network Analysis of Hermeneutic Documents Based on Bible Citations

We Go to Church. Lesson at a Glance. Worshiping God. Lesson Objectives. Lesson Plan. Bible Story Text. Bible Truth. Lesson 3

Hannah Talks to God. Lesson Plan

Josiah Loves God s Word

With best Christmas wishes, Bill Chu Chair, Canadians For Reconciliation Society. Bcc: media. Dear friends:

Protestant Orthodoxy 復原教正統主義

FELLOWSHIP WITH BELIEVERS

THE PRAXIS OF PRAYER HOW POPE FRANCIS PRAYS

Systematic Theology 系統神學

Copyr ight Copyright Tridonic GmbH & Co KG All rights reserved. Manufactur er

The Resurrection. John 20:1-18. CAC Fort Myers 10/27/2013 1:30 PM Page 1 of 7

TUNIS S NEW MOSQUES CONSTRUCTED BETWEEN 1975 AND 1995: MORPHOLOGICAL KNOWLEDGE

Introduction to the Special Issue on Computational Anaphora Resolution

Brothers and Sisters

Automatic Evaluation for Anaphora Resolution in SUPAR system 1

Towards a more consistent and comprehensive evaluation of anaphora resolution algorithms and systems

John Calvin 加爾文. Devotion: The Greatness of God 神的大能 Isaiah 6:1-5 賽 6:1~5. A. The Uniqueness of God 獨一無二的神

SEEDLING FALL Soo-Ping Yeung and Janice Li (front row), Linda Chin and Emi Koe (back row),

Entering His Presence

以弗所書 Ephesians 6:1-4 6:1 你們作兒女的, 要在主裏聽從父母, 這是理所當然的. Children, obey your parents in the Lord, for this is right.

Sunday June 4, Subject - GOD THE ONLY CAUSE AND VREATOR. Golden Text : Luke 11 : 2

Announcement. -Sabbath-Shalom

十四種御心法.14 ways of complete control of mind

Identifying Anaphoric and Non- Anaphoric Noun Phrases to Improve Coreference Resolution

教友通訊 N e w s l e t t e r September/October/November 2018 九月 / 十月 / 十一月

Systematic Theology 系統神學 Bread of Life Theological Seminary ST

動詞試題精選 第一章. Mary was there, but her three brothers. don t didn t wasn t weren t How much? do you cost the books the books cost you

有關登記和建立登記的準則, 請參閱附錄的國際聯盟規則第 6.1 款和隨後各相關條款. Part 3:Rules for the Society of St. Vincent de Paul (Greater China) 第三部 : 聖雲先會中華區會規

The Efficacious Power of the Ritual for Receiving the Moral Precepts 受戒的力量不可思議

The Chalcedon Definition 迦克墩之決議

346 Book Reviews completes the first part of the book with a thematic and chronological summary of the nature and history of the Neo-Confucian movemen

08 Anaphora resolution

Yielding to the Holy Spirit

Pala Indian Reservation Ministry 印第安保護區宣教事工

THE LORD YOUR GOD's Wordpress May The Lord Your God be blessed always. LEGAL LAW Enforce 5

A Different Journey 一條不一樣的道路 Joshua 5

reluctant to acknowledge them to others.

Back to the Sustainability! Seeking the Common Vision of Ecological Reconciliation in Christianity, Ren, and Tao

Systematic Theology 系統神學 Bread of Life Theological Seminary ST_

Bond Slaves/ Servants For the King of Kings

Matthew 13:1-53; Mark 4:1-34; Luke 8:4-18. Matthew 13:1-9 Mark 4:1-9 Luke 8:4-8

Outline of today s lecture

Chinese Traditional Religions

Anaphora Resolution. Nuno Nobre

Anaphora Resolution in Hindi Language

Localization Algorithm for Sparse-Anchored WSN in Agriculture

Extension of the Upper Extremity with Shoulder Movements

2. Xiǎo Wáng s Friday a. 8:30 get up b. 11:20 eat lunch with his roommate c. 2:45 attend an English class d. 9:15 at night go dancing

國立宜蘭高商綜高一年級 104 學年度第 1 學期第二次段考考卷

My mother only had one eye. I never wanted her to show up at my school. One day during elementary school, I was terribly ill. My mother came.

DISCIPLESHIP TRAINING LIVE IN THE WORD CBCWLA, MAY 8, 2011

大學入學考試中心 高中英語聽力測驗試題示例 1

ANAPHORIC REFERENCE IN JUSTIN BIEBER S ALBUM BELIEVE ACOUSTIC

此上過佛剎微塵數世界 有 世界名香光雲 佛號思惟 慧 此上過佛剎微塵數世 界 有世界名無怨讎 佛號 精進勝慧海 此上過佛剎微 塵數世界 有世界名一切莊 嚴具光明幢 佛號普現悅意 蓮華自在王. The Flower Adornment Sutra With Commentary

Anaphora Resolution in Biomedical Literature: A

俄利根 ORIGEN [From Reinhold Seeberg, A Text-book of the History of Doctrine, pp ]

SUBMIT AND LOVE 順服與愛. Ephesians 5:21 33; 6:1 4 以弗所書 5:21 33; 6:1 4 全守望牧師 Pastor Tony Chon

Are we ready to unlearn? Tse Pak-shing

二 一四年國殤節特會 Memorial Day Conference. General Subject THE HEAVENLY VISION 總題屬天的異象篇題. Message Titles

Resolving Direct and Indirect Anaphora for Japanese Definite Noun Phrases

第 104 期. Why Is December 26 called Boxing Day?

服事的人生. Serve One Another. 信息 : 葛國光牧師專題 : 蔡滿榮牧師英語 :Andy Kuo 2009 年柑縣台福基督教會夏令會. English. EFCOC Summer Retreat

John

Hybrid Approach to Pronominal Anaphora Resolution in English Newspaper Text

國語部每日靈修 城北華人基督教會. Daily Hope 2019 年 2 月 日 ( 第 6 週 ) 作者 :Rick Warren 牧師翻譯整理 : 城北國語部

CHRIST AND THE SABBATH

Reuben and the Snare of the Golden Calf

i» M < 1 I I MERIT SYSTEMS PROTECTION CHICAGO REGIONAL OFFICE

Sardis 撒狄 The Dead City 死了的城市

c The dogs did what they were told so that their masters did not hit them.

Catholic Parish of Lindfield-Killara

102 學年度四技二專統一入學測驗外語群英語類專業 ( 二 ) 試題

British Oscars Hopefuls 有希望荣获奥斯卡奖的英国人

2016 門徒特訓耶穌升天前的命令 Jesus Command Before Ascending to Heaven 辜茂松牧師

NEW LIFE NEWS. T h e I n d u l g e n c e C o n t r o v e r s y 路德會新生命堂. New Life Chinese Lutheran Church INSIDE THIS ISSUE:

A DIGEST OF CHAPTER 14

多倫多中華聯合教會 TORONTO CHINESE UNITED CHURCH 3300 Kennedy Road, Scarborough, Ontario, Canada M1V 3S8 Telephone / Fax: (416) & (416)

Handling and Mediating Church Conflicts

Dharma Rhymes 智海法師法語. Master Chi Hoi

金山聖寺通訊 你們皈依我的人, 今天我要向你們下一道命令 什麼命令? 要布施! 我要向你們化緣 有人說 : 師父, 這麼多年來, 你也沒有向我們化過緣, 今天向我們化緣, 一定要化一個大緣囉! 不錯! 小緣我不化,

讓基督在我們身上顯大. Let Christ Be Magnified In Us

PARISH MINISTRIES/GROUPS

Design Review Board. John Ellsworth, Environmental Planner on behalf of Verizon Wireless, First Presbyterian Church

L1 Never Losing Hope ( 實用技能教材四 )

電話 傳真 Tel Fax

Processional. a writer s cottage. Alexandria, Virginia, 2017

The Emergence Of Ch'an Buddhism A Revisionist Perspective

Performance Analysis of two Anaphora Resolution System for Hindi Language

GETTING GOD S WORD TO THE CHINESE PEOPLE

Transcription:

Journal of Chnese Language and Computng 7 (): 4-56 4 Zero Anaphora Resoluton n Chnese wth Shallow Parsng Chng-Long Yeh, Y-Chun Chen 2 Department of Computer Scence and Engneerng, Tatung Unversty 2 40 Chungshan N. Rd. 3rd. Secton Tape 04 Tawan chngyeh@cse.ttu.edu.tw, yjchen7@ms7.hnet.net Abstract Most tradtonal approaches to anaphora resoluton are based on the ntegraton of complex lngustc nformaton and doman knowledge. However, the constructon of a doman knowledge base s very labor-ntensve and tme-consumng. In ths paper, we work on the output of a part-of-speech tagger and use shallow parsng nstead of complex parsng to resolve zero anaphors n wrtten Chnese. We employ centerng theory and constrant rules to dentfy the antecedents of zero anaphors as they appear n the precedng utterances. We focus on the cases of zero anaphors that occur n the topc or subject, and object postons of utterances. The expermental result shows that the precson rates of zero anaphora detecton and the recall rate of zero anaphora resoluton wth the method are 8% and 70% respectvely. Keywords Anaphora Resoluton; Zero Anaphora Detecton; Antecedent Identfcaton; Shallow Parsng; Centerng Theory. Introducton In natural languages, expressons that can be deduced contextually by the reader are frequently omtted n texts. Ths s especally the case n Chnese, where a knd of anaphorc expresson s frequently elmnated. Ths wll be termed zero anaphor (ZA) hereafter, due to ts promnence n dscourse (L and Thompson 98). The omsson may cause consderable problems n natural language processng systems. For example n a machne translaton system, a Chnese text can not be translated properly nto text n a target language wthout dentfyng the meanng of the omtted expressons frst. In nformaton extracton, the events related to some subjects omtted n texts can not be extracted effectvely. In ths paper, we am at the resoluton of zero anaphora n Chnese text. An approach of anaphora resoluton employs knowledge sources or factors, for example, gender and number agreement, c-command constrants, semantc nformaton to dscount unlkely canddates untl a mnmal set of plausble canddates s obtaned (Grosz et al. 995; Lappn and Leass 994; Okumura and Tamura 996; Walker et al. 998; Yeh and Chen 200). Anaphorc relatons between anaphors and ther antecedents are dentfed based on the ntegraton of lngustc and doman knowledge. However, t s very labor-ntensve and tme-consumng to construct grammatcal and doman knowledge base. Another approach

42 Chng-Long Yeh, Y-Chun Chen employs statstcal models or AI technques, such as machne learnng, to compute the most lkely canddate (Aone and Bennett 995; Connoly et al. 994; Ge et al. 998; Sek et al. 2002). Ths approach can sort out the above problems. However, t heavly reles upon the avalablty of suffcently large text corpora that are tagged, n partcular, wth referental nformaton (Stuckardt 2002). A recent approach s the search for nexpensve, fast and relable procedures of anaphora resoluton (Baldwn 997; Ferrández et al. 998; Kennedy and Boguraev 996; Mtkov 998). The approach reles on relable and cheaper NLP tools such as part-of-speech (POS) tagger and shallow parsers. In ths paper, we adopt ths approach. The task of ZA resoluton can be dvded nto two phases: frst detectng the occurrences of zero anaphors n text, and then fndng ther antecedents n the dscourse. A POS tagger and the followng shallow parser are used to accomplsh the task of the frst phase. We then employ the centerng theory (Grosz et al. 995) to develop a rule-base as the bass to determne the antecedents of zero anaphors found n the frst phase. We have carred out an experment usng a number of news artcles as the test data. The result shows that the precson rate of zero anaphora detecton s 80% and wthn the detected zero anaphors, 70% can be resolved correctly. In the followng sectons we frst brefly descrbe the nature of zero anaphora n Chnese. In Secton 3 we descrbe n detals the shallow parsng method. In Secton 4 we descrbe the ZA resoluton method. In Secton 5, we show the experments and result. Fnally our conclusons are summarzed, and future works are suggested. 2 Zero Anaphora n Chnese As mentoned n Secton, zero anaphors are generally noun phrases that are understood from the context and do not need to be specfed. For example n (), the topc of the utterance (a) s 張三 Zhangsan whch s elmnated n the second utterance. () a. 張三驚慌的往外跑, Zhangsan jnghuang de wang wa pao Zhangsan frghtened CSC towards outsde run Zhangsan frghtened and ran outsde. b. φ 撞到一個人 j, zhuangdao y ge ren (he) bump-to a person (He) bumped nto a person. j c. 他 2 看清了那人的長相, ta kanqng le na ren de zhangxang he see-clear ASPECT that person GEN appearance He saw clearly that person s appearance. d. φ 認出那人 3 j 是誰 renchu na ren sh she (he) recognse that person s who (He) recognzed who that man s.

Zero Anaphora Resoluton n Chnese wth Shallow Parsng 43 In addton to zero anaphors, anaphors can be pronomnal and nomnal forms, as exemplfed by 他 He and 那個人 that person n (c) and (d), respectvely (Chen 987). Accordng to L and Thompson (L and Thompson 98), zero anaphors can be classfed as ntrasentental or ntersentental. In the ntrasentental case, the antecedent exsts n the same sentence, or the zero anaphor can be understood and does not need to be expressed, such as the φ n (2) whle antecedent and anaphors are located n dfferent sentences n the ntersentental case, such as the φ and n (b) and (d). (2) 張三參加比賽 φ 贏得一台電腦 Zhangsan canja bsa yngde y ta dannao Zhangsan enter competton (he) wn a CL computer Zhangsan entered a competton and (he) wn a computer. In the ntersentental case, antecedent and anaphors are located n dfferent sentences. Dependng upon the dstance between the sentences contanng antecedent and anaphor, t can further be dvded nto two types: mmedate and long dstance. The former s where the sentence contanng the antecedent s mmedately followed by the one contanng the j k anaphor, such as φ n (3b) and φ n (3d). On the other hand, for the long dstance type, the sentence contanng the antecedent and anaphors,, are not n mmedately succeedng order, such as φ n (3e). (3) a. 螃蟹 有四對步足 j, pangxe you s du buzu crab have four-par walkng-foot A crab has four pars of feet. j b. φ 俗稱 腿兒, sucheng tuer (they) common-called "tuer" (They) are commonly called "tuer." k c. 由於每條 腿兒 的關節只能向下彎曲, youyu me tao tuer de guanje zhneng xang xa wanqu snce every "tuer" ASSOC jont only can towards down bend Snce every "tuer"'s jont can only bend downwards, k d. φ 不能向前後彎曲, buneng xang qanhou wanqu (t) not can towards forward-backward bend (t) can't bend backward or forwards. e. φ 爬行時, paxng sh (t) crawl ASPECT b φ a We use a to denote a zero anaphor, where the subscrpt a s the ndex of the zero anaphor tself and the superscrpt b s the ndex of the referent. A sngle φ wthout any scrpt represents an ntrasentental zero anaphor. Also note that a superscrpt attached to an NP s used to represent the ndex of the referent.

44 Chng-Long Yeh, Y-Chun Chen When (t) crawls, f. φ 2 必須先用一邊步足的指尖抓地, bxu xan yong y ban buzu de zhjan zhua d (t) must frst use one-sde walkng-foot ASSOC fngertp grasp-on ground (t) must use the tps of feet on one sde to grasp the ground. g. h. φ 再用另一邊的步足直伸起來, 3 za yong lng y ban de buzu zhshen qla (t) then use another one-sde ASSOC walkng-foot straght-rse upwards (It) then uses the feet on the other sde to move upwards. φ 4 把身體推過去 ba shent tu guoqu (t) BA body push get-through (It) pushes the body towards one sde. 3 Sentence Parsng Full parsng s used to provde an as detaled as possble analyss of the sentence structure and to buld a complete parse tree for the sentence, whle shallow parsng s lmted to parsng smaller consttuents such as noun phrases or verb phrases (Abney 996; L and Roth 200). In ths secton, we show you some examples of full parsng and descrbe our method of shallow parsng n Chnese. 3. Full Parsng Many tradtonal approaches to parsng natural language sentences am to recover complete, exact parses based on the ntegraton of complex syntactc and semantc nformaton. They search through the entre space of parses defned by the grammar and then seek the globally best parse referrng to some heurstc rules or manual correcton. For example, the sentence (4) taken from Snca Treebank (Snca Treebank 2002) s annotated as below. (4) 他終於找到一份工作了 ta zhongyu zhaodao y fen gongzuo le he fnal fnd a CL job ASPECT He fnally found a job. S(agent:NP(Head:Nhaa: 他 ) tme:dd: 終於 Head:VC2: 找到 goal:np(quantfer: DM: 一份 Head:Nac: 工作 ) partcle:ta: 了 ) S(agent:NP(Head:Nhaa:he) tme:dd:fnally Head:VC2:fnd goal:np(quantfer: DM:a Head:Nac:job) partcle:ta:le) The sentence structure n Snca Treebank s represented by employng head-drven prncple, that s, each sentence or phrase has a head leadng t. A phrase conssts of a head, arguments and adjuncts. One can use the concept of head to fgure out the relatonshp among the phrases n a sentence. In the example (4), the head of the NP (noun phrase), 他 he, s the agent of the verb, 找到 fnd. Although the head-drven prncple may prevent the ambguty of syntactcal analyss (Chen et al. 999), to choose the head of a phrase

Zero Anaphora Resoluton n Chnese wth Shallow Parsng 45 automatcally may cause errors. Another example (5) s extracted from the Penn Chnese TreeBank (The Penn Chnese Treebank Project 2000). (5) 張三告訴李四王五來了 Zhangsan gaosu Ls Wangwu la le Zhangsan tell Ls Wangwu come ASPECT Zhangsan told Ls that Wangwu has come. (IP (NP-PN-SBJ (NR 張三 )) (VP (VV 告訴 ) (NP-PN-OBJ (NR 李四 )) (IP (NP-PN-SBJ (NR 王五 )) (VP (VV 來 ) (AS 了 ))))) (IP (NP-PN-SBJ (NR Zhangsan)) (VP (VV tell) (NP-PN-OBJ (NR Ls)) (IP (NP-PN-SBJ (NR Wangwu)) (VP (VV come) (AS le)))))) The Penn Chnese TreeBank provdes sold lngustc analyss for the selected text, based on the current research n Chnese syntax and the lngustc expertse of those nvolved n the Penn Chnese Treebank project to annotate the text manually. 3.2 Shallow Parsng Shallow (or partal) parsng whch s an nexpensve, fast and relable method does not delver full syntactc analyss but s lmted to parsng smaller syntactcal related consttuents (Abney 99; Abney 996; L and Roth 200; Mtkov 999). For example, the sentence (6a) and can be dvded as (6b): (6) a. 花蓮成為熱門的旅遊地點 Hualan chengwe remen de luyou ddan Hualan become popular NOM tour place Hualen became the popular tourst attracton. b. [NP 花蓮 ] [VP 成為 ] [NP 熱門的旅遊地點 ] [NP Hualen ] [VP became] [NP the popular tourst attracton] Gven a Chnese sentence, our method of shallow parsng s dvded nto the followng steps: Frst the sentence s dvded nto a sequence of POS-tagged words by employng a segmentaton program, AUTOTAG, whch s a POS tagger developed by CKIP, Academa Snca (CKIP 999). Second the sequence of words s parsed nto smaller consttuents such as noun phrases and verb phrases wth phrase-level parsng. Each phrase s represented as a word lst. Then the sequence of word lsts s transformed nto trples, [S,P,O]. For example n (7), (7b) s the output of sentence (7a) produced by AUTOTAG and (7c) s the trple representaton. (7) a. [ 花蓮 (Nc) 成為 (VG) 熱門 (VH) 的 (DE) 旅遊 (VA) 地點 (Na)]

46 Chng-Long Yeh, Y-Chun Chen b. [[ 花蓮 ], np], [[ 成為 ], vp], [[ 熱門, 的, 旅遊, 地點 ], np] c. [[ 花蓮 ], [ 成為 ], [ 熱門, 的, 旅遊, 地點 ]] The defnton of trple representaton s llustrated n Defnton.The trple here s a smple representaton whch conssts of three elements: S, P and O whch correspond to the Subject (noun phrase), Predcate (verb phrase) and Object (noun phrase) respectvely n a clause. Defnton : A Trple T s characterzed by a 3-tuple: T = [S, P, O] where S s a lst of nouns whose grammatcal role s the subject of a clause. P s a lst of verbs or a preposton whose grammatcal role s the predcate of a clause. O s a lst of nouns whose grammatcal role s the object of a clause. In the step of trple transformaton, the sequence of word lsts as shown n (7b) s transformed nto trples by employng the Trple Rules. The Trple Rules s bult by referrng to the Chnese syntax. There are four knds of Trples n the Trple Rules, whch corresponds to fve basc clauses: subject + transtve verb + object, subject + ntranstve verb, subject + preposton + object, and a noun phrase only. The rules lsted below are employed n order: Trple Rules: Trple(S,P,O) np(s), vtp(p), np(o). Trple2(S,P,none) np(s), vp(p). Trple3(S,P,O) np(s), prep(p), np(o). Trple4(S,none,none) np(s). The vtp(p) denotes that the predcate s a transtve verb phrase, whch contans a transtve verb n the rghtmost poston n the phrase; lkewse the vp(p) denotes that the predcate s an ntranstve verb phrase, whch contans an ntranstve verb n the rghtmost poston n the phrase. In the rule Trple3, the prep(p) denotes that the predcate s a preposton. The Trple4 s employed only f a sentence contans only one noun phrase and no other consttuent. If all the rules n the Trple Rules faled, the ZA Trple Rules are employed to detect zero anaphor (ZA) canddates. ZA Trple Rules: Trplez(zero,P,O) vtp(p), np(o). Trplez2(S,P,zero) np(s), vtp(p). Trplez3(zero,P,zero) vtp(p). Trple2z (zero,p,none) vp(p). Trple3z(zero,P,O) prep(p), np(o). Trple4z(zero,P,O) co-conj(p), np(o). The zero anaphora n Chnese generally occurs n the topc, subject or object poston. The rules Trplez, Trple2z, and Trple3z detect the zero anaphora occurrng n the topc or subject poston. The rule Trplez2 detects the zero anaphora n the object poston and Trplez3 detect the zero anaphora occurrng n both subject and object postons. In the

Zero Anaphora Resoluton n Chnese wth Shallow Parsng 47 Trple4, the co-conj(p) denotes a coordnatng conjuncton appearng n the ntal poston of a clause. For example n (8), there are two trples generated. In the second trple, zero denotes a zero anaphor accordng to Trplez. (8) 張三參加比賽贏得冠軍 Zhangsan canja bsa yngde guanjun Zhangsan enter competton wn champon Zhangsan entered a competton and won the champon. [[[ 張三 ], [ 參加 ], [ 比賽 ]], [[zero], [ 贏得 ], [ 冠軍 ]]] [[[Zhangsan], [enter], [competton]], [[zero], [wn], [champon]]] The Fgure llustrates the detaled procedure of Trple transformaton. The nput s a sequence of word lsts after phrase-level parsng. The nput sequence s scanned from the leftmost word lst n the sequence and the Trple Rules are employed to generate a new Trple. If a new Trple s generated, the remanng sub-sequence s taken as a new nput, or the ZA Trple Rules s employed to generate a new Trple. If no other word lst s left to be processed, the procedure stops, or otherwse, the procedure contnues to process the remanng sub-sequence. A sequence of word lsts Scan from the leftmost word lst to the rghtmost one n the nput sequence Employ Trple Rules to generate a new Trple Is a new Trple generated? Y Take the remanng sub-sequence as new nput N Employ ZA Trple Rules to generate a new Trple Y Does the remanng sub-sequence exst? N Output the result and stop Fgure. The procedure of Trple transformaton

48 Chng-Long Yeh, Y-Chun Chen 4 ZA Resoluton Method The ZA resoluton method we develop s dvded nto three parts. Frst each sentence of an nput document s translated nto trples as descrbed n Secton 3. Second, ZA dentfcaton verfes that each ZA canddate s annotated n trples by employng ZA dentfcaton constrants. Thrd antecedent dentfcaton dentfes the antecedent of each detected ZA by usng rules based on the centerng theory. 4. Centerng Theory In the centerng theory (Grosz et al. 995; Walker et al. 994; Strube and Hahn 996), each utterance U n a dscourse segment has two structures assocated wth t, they are called forward-lookng centers, C f (U) and backward-lookng centers, C b (U). The forward-lookng centers of U n, C f (U n ), depend only on the expressons that consttute that utterance. They are not constraned by features of any prevous utterance n the dscourse segment (DS), and the elements of C f (U n ) are partally ordered to reflect relatve promnence n U n. Grosz et al., n ther paper (Grosz et al. 995), assume that grammatcal roles are the major determnant for rankng the forward-lookng centers, wth the order Subject > Object(s) > Others. The superlatve element of C f (U n ) may become the C b of the followng utterance, C b (U n+ ).. In addton to the structures for centers, C b, and C f, the centerng theory specfes a set of constrants and rules (Grosz et al. 995; Walker et al. 994). Constrants For each utterance U n a dscourse segment U,, U m : U has exactly one C b. 2 Every element of C f (U ) must be realzed n U. 3 Rankng of elements n C f (U ) gudes determnaton of C b (U + ). 4 The choce of C b (U ) s from C f (U - ), and can not be from C f (U -2 ) or other pror sets of C f. Backward-lookng centers, C b s, are often omtted or pronomnalzed. Dscourses that contnue centerng the same entty are more coherent than those that shft from one center to another. Ths means that some transtons are preferred over others. These observatons are encapsulated n two rules: Rules For each utterance U n a dscourse segment U,, U m : I. If any element of C f (U ) s realzed by a pronoun n U + then the C b (U + ) must be realzed by a pronoun also. II. Sequences of contnuaton are preferred over sequence of retanng; and sequences of retanng are to be preferred over sequences of shftng. Rule I represents one functon of pronomnal reference: the use of a pronoun to realze the C b sgnals the hearer that the speaker s contnung to talk about the same thng. Psychologcal research and cross-lngustc research have valdated that the C b s preferentally realzed by a pronoun n Englsh and by equvalent forms (.e. zero anaphora) n other languages (Grosz et al. 995). Rule II reflect the ntuton that contnuaton of the center and the use of retentons when possble to produce smooth transtons to a new center provde a bass for local coherence.

Zero Anaphora Resoluton n Chnese wth Shallow Parsng 49 For example n (9), the subject of the utterance (9b) s elmnated, and ts antecedent s dentfed as the subject of the precedng utterance (9a) accordng to the centerng theory. (9) a. 電子股受美國高科技股重挫影響, danzgu shou meguo gaokejgu zhongcuo yngxang Electroncs stock receve USA hgh-tech stock heavy-fall affect Electroncs stocks were affected by hgh-tech stocks n USA. b. φ 持續下跌 chxu xade (Electroncs stocks) contnue fall (Electroncs stocks) contnued fallng down. 4.2 Zero Anaphora Resoluton The process of analyzng Chnese zero anaphora s dfferent from general pronoun resoluton n Englsh because zero anaphors are not expressed n dscourse. The task of ZA resolutons s dvded nto two phases: frst ZA detecton and then antecedent dentfcaton. In ths paper, we focus on the cases of ZA occurrng n the topc or subject, and object postons. In the ZA detecton phase, we use the ZA Trple Rules descrbed n 3.2 to detect omtted cases as ZA canddates denoted by zero n trples. Table shows some examples correspondng to the ZA Trple Rules. ZA Trple Rule Example φ 撞到一個人 (b) Trple z (zero,p,o) zhuangdao y ge ren (he) bump-to a person (He) bumped nto a person. 張三喜歡 φ 嗎 Trple z2 Zhangsan xhuan ma (S,P,zero) Zhangsan lke (somebody or somethng) Q Does Zhangsan lke (somebody or somethng)? φ 喜歡 φ Trple z3 (zero,p,zero) xhuan (he) lke (somebody or somethng) (He) lkes (somebody or somethng). φ 去購物了 Trple2 z qu gouwu le (zero,p,none) (he) go shoppng ASPECT (He) has gone shoppng. φ 在那邊 Trple3 z za naban (zero,p,o) (he) n there (He) s there. φ 跟小朋友玩 Trple4 z gen xaopengyou wan (zero,p,o) (he) wth chld play (He) s playng wth lttle chldren. Table. Examples of zero anaphora

50 Chng-Long Yeh, Y-Chun Chen After ZA canddates are detected by employng the ZA Trple Rules, the ZA dentfcaton constrants are utlzed to flter out non-anaphorc cases. In the ZA dentfcaton constrants, the constrant s employed to exclude the exophora 2 or cataphora 3 whch s dfferent from anaphora n texts. The constrant 2 ncludes some cases mght be ncorrectly detected as zero anaphors, such as passve sentences or nverted sentences (Hu 995). ZA dentfcaton constrants For each ZA canddate c n a dscourse:. c can not be n the frst utterance n a dscourse segment 2. ZA does not occur n the followng case: NP + be + NP + VP + c NP (topc) + NP (subject) + VP + c In the antecedent dentfcaton phase, we employ the backward-lookng center of centerng theory to dentfy the antecedent of each ZA. Frst we use noun phrase rules to obtan noun phrases n each utterance, and then the antecedent s dentfed as the most promnent noun phrase of the precedng utterance (Yeh and Chen 200): Antecedent dentfcaton rule: For each zero anaphor z n a dscourse segment U,, U m : If z occurs n U, and no zero anaphor occurs n U - then choose the noun phrase wth the correspondng grammatcal role n U - as the antecedent Else f only one zero anaphor occurs n U - then choose the antecedent of the zero anaphor n U - as the antecedent of z Else f more than one zero anaphor occurs n U - then choose the antecedent of the zero anaphor n U - as the antecedent of z accordng to grammatcal role crtera: Topc > Subject > Object > Others End f Due to topc-promnence n Chnese (L and Thompson 98), topc s the most salent grammatcal role. In general, f the topc s omtted, the subject wll be n the ntal poston of an utterance. If the topc and subject are omtted concurrently, the ZA occurs. The antecedent dentfcaton rule corresponds to the concept of centerng theory. 5 Experment and Result In ths secton we descrbe the experment and result of the two-phase zero anaphora resoluton descrbed n the precedng secton. In the ZA detecton phase, we only take the result of employng the ZA Trple Rules as the baselne at frst, and then nclude ZA dentfcaton constrants to see the dfference. In the antecedent dentfcaton phase, we also use a rule wthout nvolvng the centerng theory to pt our method aganst to show mprovement. The test corpus s a collecton of 50 news artcles contaned 998 paragraphs, 463 utterances, and 40884 Chnese words. 2 Exophora s reference of an expresson drectly to an extralngustc referent n whch the referent does not requre another expresson for ts nterpretaton. 3 Cataphora arses when a reference s made to an entty mentoned subsequently.

Zero Anaphora Resoluton n Chnese wth Shallow Parsng 5 5. ZA Detecton By employng the ZA Trple Rules and ZA dentfcaton constrants mentoned prevously, zero anaphors occur n topc or subject, and object postons can be detected. In the experment, we frst only employ the ZA Trple Rules, and then nclude the ZA dentfcaton constrants to see the mprovement. Because the ZA Trple Rules cover each possble topc or subject, and object omsson cases, the result shows that the zero anaphors are over detected. The Table shows the precson rates calculated usng equaton 2. No. of ZA correctly detected Precson rate of ZA detecton = () No. of ZA canddates The man errors of ZA detecton occur n the experment when parsng nverted sentences and non-anaphorc cases (e.g. exophora or cataphora) (Mtkov 2002; Hu 995). Cataphora s smlar to anaphora, the dfference beng the drecton of the reference. In ths paper, we do not deal wth the case that the referent of a zero anaphor s n the followng utterances, but we can detect about 60% cataphora n the test corpus by employng ZA dentfcaton constrant. 5.2 Antecedent Identfcaton In ths phase, we take the output of employng the ZA Trple Rules and ZA dentfcaton constrants, and further to dentfy the antecedents of zero anaphors. We frst use a smple antecedent dentfcaton rule wthout nvolvng the centerng theory and then employ the antecedent dentfcaton rule mentoned n 4.2 to show the mprovement: Smple Antecedent dentfcaton rule: For each zero anaphor z n a dscourse segment U,, U m : If z occurs n U then choose the noun phrase n U - havng the longest dstance from z as the antecedent. The smple antecedent dentfcaton rule does not consder the rankng of centers n the centerng theory (Grosz et al. 995). By comparng wth the smple antecedent dentfcaton rule, the antecedent dentfcaton rule based on the centerng theory (see 4.2) determnes the antecedents accordng to grammatcal role crtera. For example, n the dscourse segment (0), the zero anaphors are detected n the utterances (0b) and (0c). Accordng to the antecedent dentfcaton rule, the noun phrase, 基隆醫院 Kee-lung General Hosptal, whose grammatcal role corresponds to the zero anaphor φ n (0b) s dentfed as the antecedent. Subsequently, the antecedent of the zero anaphor φ 2 n (0c) s dentfed as the antecedent of φ n (0b), 基隆醫院. (0) a. 基隆醫院為擴大服務範圍, Jlong yyuan we kuoda fuwu fanwe Kee-lung hosptal for expand servce coverage Kee-lung General Hosptal ams to expand servce coverage. b. φ 積極提升醫療服務品質及標準化, jj tsheng ylao fuwu pnzh j baozhunhua (t) actve mprove medcal-treatment servce qualty and standardzaton

52 Chng-Long Yeh, Y-Chun Chen (It) actvely mproves the servce qualty of medcal treatment and standardzaton. c. φ 2 獲衛生署認可為辦理外勞體檢醫院 huo weshengshu renke we banl walao tjan yyuan (t) obtan Department-of-Health certfy to-be handle foregn-laborer physcal-examnaton hosptal (It) s certfed by Department of Health as a hosptal whch can handle physcal examnatons of foregn laborers. Table 3 shows the recall rates and precson rates of ZA resoluton calculated usng equaton 2 and equaton 3. Errors occur n the phase when a zero anaphor refers to an entty other than the correspondng grammatcal role or the antecedent of the zero anaphor n the precedng utterance. No.of antecedentcorrectly dentfed Precson rateof ZA resoluton = (2) No.of ZA canddates No.of antecedentcorrectly dentfed Recall rateof ZA detecton = (3) No.of ZA occurredn text Cases ZA Trple rules + ZA Trple rules ZAs constrants No. of ZAs 226 226 ZA Canddates 3400 2754 Precson Rate 65.2% 80.5% Table 2. Results of ZA detecton Cases smple antecedent employ centerng Accuracy dentfcaton rule theory Recall Rate 65.8% 70% Precson Rate 55.3% 60.3% Table 3. Results of ZA resoluton 6 Conclusons In ths paper, we develop an nexpensve method of Chnese ZA resoluton that works on the output of a part-of-speech tagger and uses a shallow parsng nstead of a complex parsng to resolve zero anaphors n Chnese texts. In our prelmnary experment, we deal wth the cases of topc or subject, and object omsson. The precson rate of ZA detecton s 8% and the recall rate of ZA resoluton s 70%. The errors of ZA resoluton are n the followng cases:. Out of the grammatcal role crtera (rankng of forward-lookng centers): When a ZA refers to an entty other than the correspondng grammatcal role or the antecedent of the zero anaphor n the precedng utterance. 2. Out of local coherence: The antecedent of a ZA s mentoned n more prevous utterances. 3. Cataphora: When a ZA refers to an antecedent mentoned n the succeedng utterances. 4. Other non-anaphorc cases: Dependng on the background knowledge of readers, the referent of a ZA does not requre expresson n the text.

Zero Anaphora Resoluton n Chnese wth Shallow Parsng 53 In case 3 and 4, we do not tend to treat non-anaphorc cases n ths paper, but we can detect about 60% cataphora and exophora and 50% nverted sentences n the test corpus by employng ZA dentfcaton constrants. We have performed the method and experment on ZA resoluton n the prevous sectons. The result s promsng to some extent; however, there are stll some problems that need further nvestgaton, such as pronoun resoluton and the applcatons of ZA resoluton. In the task of pronoun resoluton, because the pronomnal anaphors are expressed n dscourse, the detecton rules are unnecessary to the task of pronoun resoluton. We may modfy the antecedent dentfcaton rule mentoned n 3.3 to dentfy the antecedents of pronomnal anaphors occurrng n utterances and some anaphora resoluton factors can be used, such as gender and number agreement (Lappn and Leass 994). Another lne of research to be undertaken n the future s the enhancement of the shallow parsng technque we used n ths paper. For example, one mght enhance the output of text chunkng, wthout analyzng each phrase structure n an utterance but by dvdng each clause wthn an utterance nto syntactcally correlated parts of words. We would also further extend our approach to dealng wth other omsson cases, such as verb omsson and conduct more experments on texts from other domans. 7 Acknowledgement We gve our specal thanks to CKIP, Academa Snca for makng great efforts n computatonal lngustcs and sharng the Autotag program to academc research. 8 References Abney, Steven, 99, Parsng by chunks, In Robert Berwck, Steven Abney, and Carol Tenny, edtors, Prncple-Based Parsng, Kluwer Academc Publshers. Abney, Steven, 996, Taggng and Partal Parsng, In: Ken Church, Steve Young, and Gerrt Bloothooft (eds.), Corpus-Based Methods n Language and Speech, An ELSNET volume, Kluwer Academc Publshers, Dordrecht. Aone, Chnatsu and Bennett, Scott Wllam, 995, Evaluatng automated and manual acquston of anaphora resoluton strateges, Proceedngs of the 33rd Annual Meetng of the ACL, Santa Cruz, New Mexco, pages 22 29. Baldwn, Breck, 997, CogNIAC: hgh precson coreference wth lmted knowledge and lngustc resources, ACL/EACL workshop on Operatonal factors n practcal, robust anaphor resoluton. Chen, F.-Y., Tsa, P.-F., Chen, K.-J. and Huang, C.-R., 999, Snca Treebank, Computatonal Lngustcs and Chnese Language Processng (CLCLP), 4(2): 87-04. Chen, P, 987, Hanyu lngxn huzh de huayu fenx (a dscourse approach to zero anaphora n chnese) (n chnese), Zhongguo Yuwen (Chnese Lngustcs), pages 363-378. CKIP, 999, 中文自動斷詞系統 Verson.0 (Autotag), http://godel.s.snca.edu.tw /CKIP/, Academa Snca. Connoly, Denns, Burger, John D. and Day, Davd S., 994, A Machne learnng approach to anaphorc reference, Proceedngs of the Internatonal Conference on New Methods n Language Processng, 255-26, Manchester, Unted Kngdom. Ferrández, A., Palomar, Manuel and Moreno, Lda, 998, Anaphor Resoluton n Unrestrcted Texts wth Partal Parsng, Proceedngs of the 8th Internatonal

54 Chng-Long Yeh, Y-Chun Chen Conference on Computatonal Lngustcs (COLING'98)/ACL'98 Conference, pages 385-39. Montreal, Canada. Gazdar, G. and Mellsh, C., 989, Natural Language Processng n PROLOG An Introducton to Computatonal Lngustcs, Addson- Wesley. Ge, Nyu, Hale, John and Charnak, Eugene, 998, A statstcal approach to anaphora resoluton, Proceedngs of the Sxth Workshop on Very Large Corpora, pages 6 70 Grosz, B. J. and Sdner, C. L., 986, Attenton, ntentons, and the structure of dscourse, Computatonal Lngustcs, No 3 Vol 2, pp. 75-204. Grosz, B. J., Josh, A. K. and Wensten, S., 995, Centerng: A Framework for Modelng the Local Coherence of Dscourse, Computatonal Lngustcs, 2(2), pp. 203-225. Hu, Wenze, 995, Functonal Perspectves and Chnese Word Order, Ph. D. dssertaton, The Oho State Unversty. Kennedy, Chrstopher and Boguraev, Branmr, 996, Anaphora for everyone: pronomnal anaphora resoluton wthout a parser, Proceedngs of the 6th Internatonal Conference on Computatonal Lngustcs (COLING'96), 3-8. Copenhagen, Denmark. Lappn, S. and Leass, H., 994, An algorthm for pronomnal anaphor resoluton, Computatonal Lngustcs, 20(4). L, Charles N. and Thompson, Sandra A., 98, Mandarn Chnese A Functonal Reference Grammar, Unversty of Calforna Press. L, X. and Roth, D., 200, Explorng Evdence for Shallow Parsng, Proceedngs of Workshop on Computatonal Natural Language Learnng, Toulouse, France. Mtkov, Ruslan, 998, Robust pronoun resoluton wth lmted knowledge, Proceedngs of the 8th Internatonal Conference on Computatonal Lngustcs (COLING'98)/ACL'98 Conference. Montreal, Canada. Mtkov, Ruslan, 999, Anaphora resoluton: the state of the art, Workng paper (Based on the COLING'98/ACL'98 tutoral on anaphora resoluton), Unversty of Wolverhampton, Wolverhampton. Mtkov, Ruslan, 2002, Anaphora Resoluton, Longman. Okumura, Manabu and Tamura, Kouj, 996, Zero pronoun resoluton n Japanese dscourse based on centerng theory, Proceedngs of the 6th Internatonal Conference on Computatonal Lngustcs (COLING-96), 87-876. Sek, Kazuhro, Fuj, Atsush, and Ishkawa, Tetsuya, 2002, A Probablstc Method for Analyzng Japanese Anaphora Integratng Zero Pronoun Detecton and Resoluton, Proceedngs of the 9th Internatonal Conference on Computatonal Lngustcs (COLING 2002), pp.9-97. Sdner, C. L., 979, Toward a Computatonal Theory of Defnte Anaphora Comprehenson n Englsh Dscourse, Ph.D. thess, MIT. Sdner, C. L., 983, Focusng n the comprehenson of defnte anaphora, Computatonal Models of Dscourse, MIT Press. Snca Treebank, 2002, URL http://turng.s.snca.edu.tw/treesearch/, Academa Snca. Strube, M. and Hahn, U., 996, Functonal Centerng, Proceedngs Of ACL 96, Santa Cruz, Ca., pp.270-277.

Zero Anaphora Resoluton n Chnese wth Shallow Parsng 55 Stuckardt, Roland, 2002, Machne-Learnng-Based vs. Manually Desgned Approaches to Anaphor Resoluton: the Best of Two Worlds, Proceedngs of the 4th Dscourse Anaphora and Anaphor Resoluton Colloquum (DAARC2002), Unversty of Lsbon, Portugal, pages 2-26. The Penn Chnese Treebank Project, 2000, URL http://www.cs.upenn.edu/~chnese/. Lngustc Data Consortum, Unversty of Pennsylvana. Walker, M. A., 989, Evaluatng Dscourse Processng Algorthms, Proceedngs Of ACL 89, Vancouver, Canada. Walker, M. A., 998, Centerng, anaphora resoluton, and dscourse structure. In Walker, M. A., Josh, A. K. and Prnce, E. F., edtors, Centerng n Dscourse, Oxford Unversty Press. Walker, M. A., Ida, M. and Cote. S., 994, Japan Dscourse and the Process of Centerng, Computatonal Lngustcs, 20(2): 93-233. Yeh, Chng-Long and Chen, Y-Chun, 200, An emprcal study of zero anaphora resoluton n Chnese based on centerng theory, Proceedngs of ROCLING XIV, Tanan, Tawan. Yeh, Chng-Long and Chen, Y-Chun, 2003, Usng Zero Anaphora Resoluton to Improve Text Categorzaton, Proceedngs of PACLIC 7, Sentosa, Sngapore. 9 Appendx: Abbrevatons In the word-by-word translaton, some markers are abbrevated as below. We follow the abbrevatons used n []. Abbrevaton ASSOC ASPECT BA BEI CL CSC GEN NOM Q Term assocatve (de) aspect marker ba be classfer complex statve constructon (de) gentve (de) nomnalzer (de) Queston (ma)