Invaliiy Paen Searh Sysem o TT DATA Kazuya Konishi Akira Kiauhi Toru Takaki Researh an Developmen Heaquarers, TT DATA CORPORATIO Kayabaho Tower Blg., 1-21-2 Shinkawa, Chuo-ku, Tokyo 104-0033, Japan {konishikzy, kiauhia, akaki}@naa.o.jp Absra We give an overview o our invaliiy paen searh sysem or TCIR-4 PATET. The sysem uses oumen rerieval ehniques an mehos ha are suiable or invaliiy searh, i.e., query erm exraion base on haraerisis o he invenion, a rerieval moel using omponens o he invenion, ranking using a erm weighing base on aegory inormaion, an so on. This paper esribes hese mehos, an evaluaes he searh resuls given by hem. Keywors: Paen rerieval, Invaliiy searh, Query erm exraion. 1. Inrouion The TCIR-4 Paen Rerieval Task is an invaliiy searh. In his searh, he examiners have o in he exising paen speiiaions ha esribe he same invenion o he opi laim. However, i is oen iiul o rerieve suh speiiaions by using he ommon ype o oumen rerieval sysem base on erm mahing. The reasons or his problem are lise below. 1. Sine he erms inlue in a laim are oen absra or reaive in orer o expan he laim s sope, ieren speiiaions en o omprise ieren erms even i hese erms explain he same hings. 2. I is possible ha a subse o erms in a laim mah an invenion omponen ha is ieren rom he invenion omponens in he opi laim. This happens beause a subse o he erms oes no neessarily speiy he invenion omponens. 3. The egree o isinguishing one invenion rom anoher epens on he level o speializaion o he paen lassiiaion o he invenion. Sine paen lassiiaions are highly speialize an inepenen rom eah oher, he inerpreaion o he erm varies rom iel o iel. Through onsieraion o hese reasons, we have evelope an implemene oumen rerieval mehos ha are suiable or invaliiy searhes. This paper esribes hese mehos rom he perspeive o he irs reason above. Aiionally, i evaluaes he searh resuls given by our mehos. 2. Sysem Desripion Firs o all, we provie a esripion o he invaliiy searh sysem as bakgroun inormaion beore esribing our rerieval mehos. The inpu o his sysem is a single paen speiiaion. The speiiaion in urn has a single opi laim. The sysem oupu is a lis o exising speiiaions ha esribe he same invenion o he opi laim. The sysem onus he searh aer prouing queries orresponing o he invaliiy searh base on he erms inlue in he opi laim. Here is a summary o eah sep o he proess. (1) Query erm exraion: We perorm morphologial analysis o exra he wor (mainly nouns) rom he opi laim as query erms. We use ChaSen [1] as he morphologial analyzer. Aiionally, sequenes o onen wors are exrae as ompoun query erms. We use 73 sopwors ha appear requenly in he exising speiiaions. (2) Exising paen speiiaion rerieval: We rerieve he exising paen speiiaions ha esribe invenions ha migh be ienial o he one o he opi laim. We use he BM25 ormula o Okapi [2] or he ranking proess o his rerieval. This ormula is a ranking moel use in many rerieval sysems. 3. Rerieval Mehos 3.1. Query Term Exraion base on Charaerisis o Invenion In his seion, we explain how o exra he query erms ouse on he haraerisis o he invenion [3]. This meho o exraion solves he problem in whih ieren erms ha sugges he same hing are esribe in various ieren ways in ieren laims. By reerring o he erms o he opi laim, we exra esripions o he invenion s haraerisis rom he eaile esripion o he invenion in he speiiaion. The erms inlue in he esripion are se as aiional query erms. Sine aiional query erms are relae o he erms lise in he opi laim abou he invenion, we reer o hem as relae erms rom here on. 2004 aional Insiue o Inormais
Invenion: A Invenion: B Consiuion Reeiving Terminal Home Aria Memory Reeiving C hannels Searh Wireless Seleion Call Reeiver Sysem Inormaion Memory Reeiving C hannels Conroller Ieniiaion o he Sope o C laim or Paen Dieren Terms are Use Funions an Operaions o eah Componen saves on he baery by searhing eeively he reeiving hannels memories inormaion abou he hannels ha shoul be reeive searhes only he hannels ha shoul be reeive saves on he baery by searhing eeively he reeiving hannels memories inormaion abou he hannels ha shoul be reeive searhes only he hannels ha shoul be reeive Clear Desripion o Charaerisis o he Invenion Same Terms are Use Figure 1. Hypohesis abou esripions o speiiaions erive rom he same invenion This meho is base on he hypohesis ha he esripions are ommon o eah speiiaion when hey are erive rom he same ehnial ieas o he invenion. The esripion o he haraerisis o he invenion an isel be haraerize as ollows; i explains he unions o he invenion argee by he erms o he laim as well as he operaions ha ae he invenion. Figure 1 shows an example o he above hypohesis. Auhors may have o esribe he sope o a laim or everyone o ge he same inerpreaion sine he paen speiiaion is a ehnial oumen. However, hey esribe heir laime invenion using absra or reaive erms whih have various meanings o enlarge he sope o he laim. The erms in he laim are no well suie as query erms. Thus, we onsier ha he unions an operaions o he invenion omponens o he opi laim are learly iniae wih he onree an general erms in he "eaile esripion o he invenion" o he speiiaion. Those erms are ommon o many speiiaions ha esribe he same invenion o he opi laim, an are eeive as query erms. Beause he paen speiiaion is a ehnial oumen, we assume ha here are limiaions on he ypes o expression use in he esripion, whih explain omponens o he invenion esribe by he laim erm; he unions or operaions o eah omponen. We evelope mehos o exraing hese esripions by onuing paern mahes. For he paern mah, he expression paerns were eine as oninuous morphemes paerns. Below is a summary o he hree kins o expression paerns we evelope. The unerline pars iniae he laim erms. (1) Enumeraion expression paerns: These enumerae he hings ha onain he same unions or operaions o he hing esribe by he laim erm. (ex) "memory sorage suh as a lash memory an ROM" ROM (2) Deining expression paerns: These eine he unions o eah omponen o he invenion esribe by he laim erm. (ex) "reeiving erminal ha ahieves a baery saving" (3) Explaining expression paerns: These explain he operaions inluening he invenion, abou eah omponen o he invenion esribe by he laim erm. (ex) "sine he searh measure sars he searh, he reeiving erminal an quikly in he hannel whih shoul be reeive nex" Figure 2 shows he low o his meho. Firs, he aminisraor o he searh sysem prepares emplaes o hese oninuous morphemes paerns. Seon, he sysem omplees he expression paerns by applying he laim erms o he emplaes. Thir, he sysem exras he haraer srings ha mah he expression paerns rom he eaile esripion o he invenion par o he speiiaion. We use he Erie sysem [4] as he haraer srings exraor. Aer ha, he sysem exras he erms inlue in he haraer srings an assigns
Sep.1 Exraion o Terms inlue in he Claim Topi Claim Deaile Desripion o he Invenion Sep.2 Creaion o Expression Paerns Sep.3 Exraion o Desripion o Charaerisis o he Invenion Templaes o Expression Paern Claim Terms an Relae Te rms Query Terms Paen Speiiaion Sep.4 Exraion o Relae Terms Figure 2. Proesses o query erm exraion base on he haraerisis o he invenion hem as relae erms. The searh sysem perorms he searh using he laim erms an he relae erms as he query erms. 3.2. Oher Mehos 3.2.1. Rerieval Moel using Componens o Invenion The invenion laime in a paen appliaion usually inlues muliple omponens. In ase o he invaliiy searh, an examiner inens o in one or more similar paens ha inlue all or he majoriy o he omponens in he opi laim. Moreover, i is eeive o iniae whih omponen is esribe or no in he rerieve speiiaion. Alhough a speiiaion likely onains muliple omponens, he imporane o eah omponen is ieren. As a query erm has a weigh in he IR moel, a weighing meho or eah omponen is neee. The Jepson syle is a wriing orm or paen laims. The Jepson laim onsiss o wo esripion pars. The irs par is a preamble porion ha esribes exising ehnologies, an he seon par is an essenial porion ha esribes he eaures peuliar o he invenion. The omponens in he essenial porion are more imporan han hose in he preamble porion. The invaliiy searh sysem shoul have a union ha enables a preise searh ousing on he essenial poins o novely an he exising ehnologies. In he invaliiy searh, alhough here are usually many speiiaions ha inlue he query erms, here are generally ew speiiaions o whih he essenial onens mah almos ompleely. Thus, i is imporan o proue queries ha rele he essene o he opi laim. We implemene a meho ha uses he iniviual omponens in he laim. For eah omponen, a query is proue an relevan speiiaion aniaes are rerieve base on he relevane sore. Then, by inegraing eah relevane sore weighe by he imporane o eah omponen, he inal relevanies are eermine. 3.2.2. Query Term Expansion (LCA) A meho similar o LCA [5] was aope as he query expansion ehnique. In our searh sysem, he exene erms were exrae rom he op en ranke passages, alhough he original LCA meho exras rom he op ranke speiiaions. We resrie he maximum number o exene erms o en. 3.2.3. Ranking using Term Weighing base on Caegory Inormaion The invenion o paen speiiaions is lassiie in aorane wih he Inernaional Paen Classiiaion (IPC). We evelope an algorihm or erm weighing base on he use o aegory inormaion labele speiiaions [6]. Our approah is o weigh a erm ierenly or eah aegory only i he erm has high relevane o he speii aegories. The basi iea o aegory-base erm weighing is o exen he relaionship beween erms an oumens (speiiaions) in he i measure o ha beween erms an aegories, whih is given by where i (, = (, i (, (, = log( + 1), i ( = log, is he erm requeny o erm in oumen, is he oal requeny o all erms in oumen, is he oal number o oumens, is he oumen requeny o erm, an (1)
where i (, = (, i (, (, = log( + 1), C i ( = log, C is he oumen requeny o erm in aegory, is he number o oumens in aegory, C is he oal number o aegories, an C is he aegory requeny o erm. The rierion or eermining wheher a erm has high relevane o speii aegories is eine as log( + 1) rel ( =. log( C + 1) The erm weigh onsiering he relaionship beween erms an aegories is (, weigh a i (, = log( + 1) i ( ( rel( > h ) ( rel( h ) where h r is a hreshol o juge wheher he erm shoul be weighe or eah aegory. We urher inegrae he erm weigh wih he i weighing, whih is he measure base on he relaionship beween erms an oumens. weigh omb (,, (5) = (, i (, weigh a To reue he exeuion ime or ranking oumens, a wo-sep approah is use or rerieval. The irs sep oupus he op 3,000 oumens ranke by a sore using he BM25 ha is he same weighing sheme base on he relaion beween erms an oumens as i. In he seon sep, we rerank hese oumens by a sore using our weighing sheme, an ake he op 1,000 oumens as he inal resul or he rerieval. IPC is organize wih a ive-level hierarhy, an we employ he hir level alle "sublass" whih has 1,233 aegories as he se o aegories or he erm weighing. 3.2.4. Ranking using Passage Rerieval Sore In he ranking proessing o our searh sysem, a sore is usually given o eah speiiaion. A low ranking may be given o long speiiaions ha inlue he relevan esripion in a speii porion o he speiiaion beause he mos oen use ranking meho uses r r (2) (3) (4) oumen lengh as a ranking eaure. We evelope he meho o alulaing he inal sore wih he speiiaion sore an he passage sore in orer o give a higher sore o parially relevan speiiaions. 3.2.5. Hybri Meho We implemene a moule ha hanges he paen rerieval meho aoring o he eaures o he opi laim. The eaures are he imporane o he query erms in he laim an he exisene o he preamble porion. The ormer eaure was use o juge wheher he query erm exraion base on he haraerisis o he invenion shoul be use, an he laer was o juge wheher he ranking meho ha uses he iniviual omponens in he opi laim shoul be use. 4. Searh Resul We submie a number o sysems or he TCIR-4 Paen Rerieval Task. For all sysems, he olleion was a publiaion o unexamine paen appliaions in 1993-1997. The inex onsise o morphemes. All sysems were proue using he base sysem esribe in seion 2 an ombinaions o mehos esribe in seion 3. Figure 3 summarizes he resuls o he evaluaion or some o he sysems relaing o he exraion o he query erms base on he haraerisis o he invenion. Sys01: he base sysem Sys02: he base sysem using query erm expansion (LCA) Sys03: he base sysem using query erm exraion base on haraerisis o he invenion Sys04: he base sysem using query erm exraion base on haraerisis o he invenion an he hybri meho Sys05: he base sysem using query erm exraion base on haraerisis o he invenion, he rerieval moel using omponens, an he ranking using erm weighing base on aegory inormaion 5. Disussion There was no muh ierene in he preision o he rerieval beween he "A" rank paens (These paens an invaliae a opi laim by isel) an he "B" rank paens. (These paens an invaliae a opi laim when i is use wih oher paens.) On he oher han, we oul onirm ha here were ierenes in he preision o rerieval among he main opi, he aiional opi, an all he opis. The main opi was ha he assessors ieniy he relevan speiiaions in aiion
Figure 3. Resuls o he evaluaion abou some o he sysems o he iaions provie by he examiners o he Japanese Paen Oie (JPO). The aiional opis use only he iaions provie by he JPO examiners as he relevan speiiaions. As or he main opi, Sys04 rerieve he relevan speiiaions wih high preision. In oher wors, he query erm exraion base on he haraerisis o he invenion oul exra erms ha were ommon o many speiiaions ha esribe he same invenion an he erms oun by he examiners. For he aiional opis, Sys05 rerieve he relevan speiiaions wih high preision. The preision o Sys04 or he aiional opis was no ba, an Sys04 was he bes or all opis. However, overall, he preision o Sys01 was goo; hereore, our meho leaves room or improvemen. oe ha by an large, he preision o Sys02 was ba. Consequenly, we an expe ha he relevan erms on he laim erms selee on he basis o he ommon sense are no suie or ieniying invenions.
6. Consieraions on he ee o our query erm exraion meho Our query erm exraion meho exras he opi laim erms an he relae erms as he query erms. The relae erms exrae by exising relae erm exraion mehos suh as relevane eebak onepually relae laim erms. We an rerieve he speiiaions inluing synonyms o eah laim erm, by seing he relae erms as he query erms. In onras, he relae erms exrae by our meho relae o he union or operaion o eah invenion omponen. We an rerieve he speiiaions ha esribe he same invenion o he opi laim, by seing he relae erms. In a, we oul rerieve relevan speiiaions or opi #008, #019, #022, #032, #044, #065, #073, an so on. However, we ouln' rerieve he same relevan speiiaions by seing only he laim erms, or he laim erms an he relae erms exrae by LCA as he query erms. Furhermore, we ouln' rerieve he same relevan speiiaions by seing all erms inlue in he opi paen speiiaion. I is quie likely ha hese resuls mean our meho is able o exra he erms relevan o he query erms o he invaliiy searh rom he opi paen speiiaions. However, our meho in' work on opis #028, #046, #047, #051, #064, #071, #104, an so on, an he preisions o eah rerieval resul were worse han he rerievals using he oher mehos. Hereinaer, we onsier he reasons o hese resuls. (1) The lak o eaures or expression paern mahing (A) o abiliy o hanle he sruural paern: We implemene a program ha exras he haraer srings base on he oninuous morphemes paerns. The haraer srings explain he union or operaion o any laime invenion omponen. However, in a, here are also he haraer srings mahing any nononiguous morphemes paerns. A sruural paern mahing program is neee o exra hese haraer srings. (B) o haraerizaion o he exrae query erms: I is assume ha he laim erms mean he onsiuion o he laime invenion an he relae erms mean he union or operaion o any invenion omponen. However, in a, here are also relae erms whih mean he invenion onsiuion. Our program shoul exra he haraer srings whih explain he union or operaion o he invenion omponen orresponing o he relae erm, an shoul exra he oher relae erms rom he haraer sring, uner normal irumsanes. The exrae erms nee o be haraerize as he invenion onsiuion, or he union or operaion o he invenion omponen. (2) The iverse haraerisis o eah paen speiiaion (A) The presene or absene o he union or operaion explanaion abou he invenion omponen: Our query erm exraion meho assume ha he unions or operaions o he invenion omponen on he laime invenion are explaine in he paen speiiaion. Our reason is ha, uner he paen law, i is enough ha he invenion is explaine in he speiiaion, so ha workers skille in he pariular ehnial iel an upliae he invenion. We an inerpre he law as here is no nee o explain he invenion omponen in he speiiaion, beause a skille worker woul know he unions or operaions o he invenion omponen. Consequenly, he presene or absene o a union or operaion explanaions abou an invenion omponen may ier rom speiiaion o speiiaion, an onsequenly our meho may ail o work. (B) The valiiy o he esripive onen o he laim: We assume ha he invenion onsiuion gives a ull esripion in he laim. However, in a, here are also he opi laims wih whih a par o he invenion onsiuion is wrien. For example, in a paen speiiaion, he "laim" esribes he speii hing, an he "eaile esripion o he invenion" esribes he use o he hing an he ee o he hing base on he use. I he vial erms o explain he invenion onsiuion are laking in he laim o he paen speiiaion, our meho oesn' work. 7. Conlusion We have analyze he haraerisis o paen speiiaions an examine mehos o rerieving he speiiaions o invenions ienial o he one esribe in he opi laim. The resuls o he TCIR-4 Paen Rerieval Task showe ha our mehos ha a beneiial ee on he invaliiy searh. I is onsiere ha ousing on he exraion o he ommon erms in he speiiaions pariularly esribing ienial invenions was he reason or his resul. However, he inrease in preision by applying our mehos was moes a bes. Furher examinaions o our mehos are planne in he uure.
Reerenes [1] Y. Masumoo, A. Kiauhi, T. Yamashia, Y. Hirano, H. Masua, M. Asahara. Japanese morphologial analysis sysem ChaSen version 2.0 manual 2n eiion. Tehnial Repor AIST-IS-TR99009, AIST, 1999. [2] S.E. Roberson, S. Walker, M. Beaulieu. Okapi a TREC-7: Auomai a ho ilering, VLC an ineraive. Proeeings o he 7h Tex RErieval Conerene(TREC-7), IST Speial Publiaion 500-242, pp.253-264, 1999. [3] K. Konishi, A. Kiauhi, T. Takaki, Paen Rerieval by Query Terms Exraion base on Charaerisis o Invenion, Proeeings o Daa Engineering Work Shop, DEWS2004, 3-b-1, 2004. (In Japanese) [4] Y. Eriguhi, T. Kiani. TT Daa Desripion o he Erie Sysem Use or MUC-6. Proeeings o Tipser Tex Program (Phase II), pp. 469-470, 1996. [5] J. Xu an W.B. Cro. Query expansion using loal an global oumen analysis. In Pro. o he 19h annual inernaional ACM SIGIR onerene on researh an evelopmen in inormaion rerieval, pp. 4-11, 1996. [6] A. Kiauhi, K. Konishi, T. Takaki, Term Weighing Using Caegory Inormaion or Inormaion Rerieval, Proeeings o Daa Engineering Work Shop, DEWS2004, 2-b-5, 2004. (In Japanese)