CS 671 ICT For Development 19 th Sep 2008

Similar documents
Anaphora Resolution in Hindi Language

Mishra English Study Centre. Conjunction ज ड़न व ल. BY Pritam Kumar Raw

A Machine Learning Approach to Resolve Event Anaphora

DAV CENTENARY PUBLIC SCHOOL, PASCHIM ENCLAVE, NEW DELHI-87 SUMMATIVE ASSESSMENT 2 (SESSION ) CLASS III

Natural Language Processing (NLP) 10/30/02 CS470/670 NLP (10/30/02) 1

Hindi. Lesson 8 Skip Counting Lesson 11 Money Lesson -12 Time Addition carry over

Keywords Coreference resolution, anaphora resolution, cataphora, exaphora, annotation.

Question Answering. CS486 / 686 University of Waterloo Lecture 23: April 1 st, CS486/686 Slides (c) 2014 P. Poupart 1

Term I. Subject : English (Written)

DAV PUBLIC SCHOOL,ASHOK VIHAR,PH-IV,DELHI SESSION

Bill No. 9 of 2011 THE RAJASTHAN TENANCY (AMENDMENT) BILL, 2011 (To be Introduced in the Rajasthan Legislative Assembly) A Bill further to amend the

सवर न म, ल ग,वचन स य क त र वर म चह न अन च छ द ल खन. English Hindi Mathematics Environmental Science

ह द : 1. सभ म त र ओ स सम ब हदत २-२ शब द ल ख ए 2.प च प ज स ल न

GURU HARKRISHAN PUBLIC SCHOOL VASANT VIHAR NEW DELHI HOLIDAYS HOME WORK CLASS-III ENGLISH

SCHOOL OF ENGINEERING AND TECHNOLOGY MONAD UNIVERSITY, HAPUR

Anusāraka: Machine Translation and Language Accessor

DELHI PUBLIC SCHOOL NTPC FARAKKA SYLLABUS BREAKUP FOR

DELHI PUBLIC SCHOOL NTPC FARAKKA SYLLABUS BREAKUP FOR

Bill No. 13 of 2011 THE RAJASTHAN AGRICULTURAL PRODUCE MARKETS (AMENDMENT) BILL, 2011 (To be Introduced in the Rajasthan Legislative Assembly) A Bill

ST.JOSEPH S HIGHER SECONDARY SCHOOL

Tilak Maharashtra Vidyapeeth, Pune. Sanskrit Visharad (B.A.)

NPS INTERNATIONAL SCHOOL, GUWAHATI

ARMY PUBLIC SCHOOL MEERUT CANTT SYLLABUS FOR UNIT TEST II CLASS VIII,

Broadways International School Sec-76, Gurugram

Anaphora Resolution. Nuno Nobre

CHAPTER I INTRODUCTION. which words are related to other word of the same language. Formal differences

D.A.V PUBLIC SCHOOL (10 +2) PRATAP VIHAR HOLIDAY HOME WORK FOR CLASS- III SESSION- ( ) SUBJECT- ENGLISH

Reconsidering Raising and Experiencers in English

7.1. Unit. Terms and Propositions. Nature of propositions. Types of proposition. Classification of propositions

Anaphora Resolution in Biomedical Literature: A

KV Paschim Vihar Winter holiday homework Class I

Kindly note that answers to the above questions is to be done in EVS notebook. ***********************

Artificial Intelligence: Valid Arguments and Proof Systems. Prof. Deepak Khemani. Department of Computer Science and Engineering

ANAPHORA RESOLUTION IN HINDI LANGUAGE USING GAZETTEER METHOD

Noun Compound Interpretation

Reference Resolution. Regina Barzilay. February 23, 2004

Reference Resolution. Announcements. Last Time. 3/3 first part of the projects Example topics

Performance Analysis of two Anaphora Resolution System for Hindi Language

CREDENCE HIGH SCHOOL DUBAI Term-Wise Syllabus Grade: 3

StoryTown Reading/Language Arts Grade 2

SHARJAH INDIAN SCHOOL

:46:41 pm 1

TURCOLOGICA. Herausgegeben von Lars Johanson. Band 98. Harrassowitz Verlag Wiesbaden

THE MODERN SCHOOL, ECNCR DELHI SESSION CLASS S2 SYLLABUS FOR ANNUAL EXAMINATION

TIME AND WORK QUESTIONS FOR SSC GD RPF EXAM 2018 TIME AND WORK PDF HINDI 2018

Bachelor s Degree. Department of Oriental Languages Faculty of Archaeology, Silpakorn University

Bill No. 15 of 2014 THE CONTRACT LABOUR (REGULATION AND ABOLITION) (RAJASTHAN AMENDMENT) BILL, 2014 (To be Introduced in the Rajasthan Legislative

J.P. World School, Jammu Syllabus Bifurcation: Class: U.K.G

vlk/kj.k EXTRAORDINARY Hkkx II [k.m 3 mi&[k.m (ii) PART II Section 3 Sub-section (ii) izkf/dkj ls izdkf'kr PUBLISHED BY AUTHORITY

StoryTown Reading/Language Arts Grade 3

NISCORT FATHER AGNEL SCOOL, VAISHALI

REMAL PUBLIC SCHOOL. Class II ( ) ENGLISH UNIT- 4

An Easy Model for Doing Bible Exegesis: A Guide for Inexperienced Leaders and Teachers By Bob Young

ÛIm] g]v]t]/ g]it]] य वभ गय ग: Chapter 17 अज र न उव च य श व धम त स ज य यजन त य न वत : त ष न त क क ष ण स वम ह रजस तम: 17-1

Artificial Intelligence Prof. Deepak Khemani Department of Computer Science and Engineering Indian Institute of Technology, Madras

Vikas Bharati Public School Holiday Homework( ) Class-VI

Broadways International School,Sec-76, Gurugram

Information Extraction. CS6200 Information Retrieval (and a sort of advertisement for NLP in the spring)

ASSEMBLIES OF GOD THEOLOGICAL SEMINARY BGR 611 INDUCTIVE STUDIES IN THE GREEK NEW TESTAMENT. Professor: James D. Hernando Fall, 2008.

Anaphora Resolution in Hindi: Issues and Directions

Bill No. 8 of 2015 THE RAJASTHAN AGRICULTURAL PRODUCE MARKETS (AMENDMENT) BILL, 2015 (To be Introduced in the Rajasthan Legislative Assembly) A Bill

An Efficient Indexing Approach to Find Quranic Symbols in Large Texts

घ र क रण ऽम. ghora kashtoddharana stotram. sanskritdocuments.org

NPTEL NPTEL ONINE CERTIFICATION COURSE. Introduction to Machine Learning. Lecture-59 Ensemble Methods- Bagging,Committee Machines and Stacking

Paninian Grammar Based Hindi Dialogue Anaphora Resolution

ADDIS ABABA UNIVERSITY SCHOOL OF GRADUATE STUDIES. Design of Amharic Anaphora Resolution Model. Temesgen Dawit

Broadways International School,Sec-76, Gurugram

Subject Index. Index

ÛIm] g]v]t]/ g]it]] म क षस न य सय ग:

Summer Holiday home work

08 Anaphora resolution

ÛIm] g]v]t]/ g]it]] य वभ गय ग: Chapter 17 अश व हत घ र तप यन त य तप जन : दम भ हङ क रस य : क मर गबल न वत : 17-5

Lt. Col. Mehar Little Angels Sr. Sec. School. Lesson 1 (No Smiles Today) Q.1. How do you know that Shanti and Arun were good friends?

Summer 2012 at Hebrew College

Towards Transliteration between Sindhi Scripts Using Roman Script

S.B.V.M. Inter College,Mahmudabad (Sitapur) (English Medium Branch)

NATIONAL INSTITUTE OF OPEN SCHOOLING Mukta Vidya Vani and Radio Vahini -Community Radio FM 91.2 MHz

Digital Logic Lecture 5 Boolean Algebra and Logic Gates Part I

CREDENCE HIGH SCHOOL DUBAI Term-Wise Syllabus Grade: 4

Automatic Evaluation for Anaphora Resolution in SUPAR system 1

KENDRIYA VIDYALAYA KHICHRIPUR, DELHI (SHIFT II) HOLIDAY HOMEWORK FOR WINTER BREAK SESSION

.. AdhyAtmika vichara.. आ क व च र

Droan Vidya Peeth New Jeewan Nagar, Sonepat ( )

South Carolina English Language Arts / Houghton Mifflin English Grade Three

GOVERNMENT OF INDIA MINISTRY OF CONSUMER AFFAIRS, FOOD & PUBLIC DISTRIBUTION DEPARTMENT OF FOOD AND PUBLIC DISTRIBUTION

Tips for Using Logos Bible Software Version 3

Kāsiga School. Sample Question Paper. English Hindi Math Science For admission to class 5 ENGLISH

9 Uncorrected/ Not for Publication

Identifying Anaphoric and Non- Anaphoric Noun Phrases to Improve Coreference Resolution

न य नतम स झ क यक रम सम च र पत र क CMP E-NEWSLETTER BE A GOOD PERSON BUT DON T WASTE TIME TO PROVE IT!

PAGE(S) WHERE TAUGHT (If submission is not text, cite appropriate resource(s))

Follow-up to Extended Tamil proposal L2/10-256R. 1. Encoding model of Extended Tamil and related script-forms

च क त स उप रण एव अस पत ल य जन ववभ ग चचककत स उपकरण- आई एस ओ क य ग क ददश तनद श

Broadways International School,Sec-76, Gurugram

vlk/kj.k izkf/dkj ls izdkf'kr अ धस चन

The UPV at 2007

Scott Foresman Reading Street Common Core 2013

South Carolina English Language Arts / Houghton Mifflin Reading 2005 Grade Three

INTERMEDIATE LOGIC Glossary of key terms

The Differentia Principle as a Cornerstone of Ontology

Transcription:

CS 671 ICT For Development 19 th Sep 2008 Vishal Vachhani CFILT and DIL, IIT Bombay

Agro Explorer A Meaning Based Multilingual Search Engine Vishal Vachhani 2

Web-site for Indian farmers Farmers can submit their problems related to their crops Queries are answered by Agricultural Experts at KVK, Baramati Languages supported: Marathi, Hindi, English Vishal Vachhani 3

Why Need Multilingual Search Vast Amount of Information available ailable on the Web Almost 70% of the Information is in English The Indian rural populace is not English- Literate A Big Language Barrier Information has to be made available to them in their local languages. Vishal Vachhani 4

Why Need Meaning Based Search Most of the current Search Engines are Keyword Based. They do not consider the semantics of fthe query The result set contains a large number of extraneous documents. Search based on the Meaning of the query will help narrow down on the desired information quickly. Vishal Vachhani 5

Query in Hindi search System English Document Marathi Document Result in Hindi English Document Vishal Vachhani 6

Same Keywords Different Semantics Moneylenders Exploit Farmers Farmers Exploit Moneylenders Found 1 Result Found 0 Result Vishal Vachhani 7

Provides both Meaning Based Search Cross-Lingual Information Access Vishal Vachhani 8

System Architecture Vishal Vachhani 9

Vishal Vachhani 10

Vishal Vachhani 11

Vishal Vachhani 12

Vishal Vachhani 13

Vishal Vachhani 14

Conclusion Provides two independent d features Multi-Linguality Meaning Based Search. Because of UNL both multi-lingual and meaning based properties can be incorporated together rather than using separate language translators in search engines. The scheme admits itself to Integration of multiple languages in a seamless, scalable manner. Vishal Vachhani 15

UNL Universal Networking Language Vishal Vachhani 16

Hind i Englis h UNL Frenc h Marath i Tam il Vishal Vachhani 17

Direct translation - translation will be done directly - N*(N-1) translator are needed for N languages translation. Intermediate Language - intermediate language will be used for language translation - Only 2*N translators are required. Vishal Vachhani 18

UNL is an acronym for Universal Networking Language. UNL is a computer language that enables es computers to process information and knowledge across the language barriers. UNL is a language for representing information and knowledge provided by natural languages Unlike natural languages, UNL expressions are unambiguous. Vishal Vachhani 19

Although the UNL is a language for computers, it has all the components of a natural language. It is composed of Universal Words (UWs), Relations, Attributes. Knowledge :semantic graph Nodes concepts Arcs relation between concepts Vishal Vachhani 20

A UW represents simple or compound concepts. There are two classes of UWs: unit concepts compound structures of binary relations grouped together ( indicated with Compound UW-Ids) A UW is made up of a character string (an Englishlanguage word) followed by a list of constraints. <UW>::=<Head Word>[<Constraint List>] example state(icl>express) state(icl>country) Vishal Vachhani 21

A relation label is represented as strings of 3 characters or less. The relations between UWs are binary. rel (UW1, UW2) They have different labels according to the different roles they play. At present, there are 46 relations in UNL For example, agt (agent), ins (instrument), pur (purpose), etc. Vishal Vachhani 22

Attribute labels express additional information about the Universal Words that appear in a sentence. They show what is said from the speaker s point of view; how the speaker views what is said. (time, reference, emphasis, attitude, etc) @entry, @present, @progressive, @topic, etc. Vishal Vachhani 23

Example: Ram eats rice. {unl} agt(eat.@entry.@present, Ram) obj(eat.@entry.@present, rice(icl>eatable)) {/unl} Vishal Vachhani 24

eat plc agt Ram rice Vishal Vachhani 25

Example: The boy who works here went to school. {unl} agt(go(icl>move).@entry.@past, :01) plt(go(icl>occur).@entry.@past,school(icl>institutio n)) agt:01(work(icl>do), boy(icl>person.@entry)) plc:01(work(icl>do),here) {/unl} Vishal Vachhani 26

go agt plt work :01 school plc agt here boy Vishal Vachhani 27

Source language Enconvertor Intermediate Language Deconvertor target language Vishal Vachhani 28

It s alanguage Independent Generator It can deconvert UNL expressions into a variety of native languages, using a number of linguistic data such as Word Dictionary, Grammatical Rules of each language. The DeConverter transforms the sentence represented by a UNL expression into Natural language age sentence. Vishal Vachhani 29

Vishal Vachhani 30

Dictionary Case Marking Rules Morphology Rules Syntax Planning Rules UNL Doc UNL Parser Case Marking Module Morphology Module Syntax Planning Module Hind idoc Language dependent Module Language Independent Module Vishal Vachhani 31

UNL parser module will do following tasks Check input format of UNL document Separate attributes form UWs Separate attributes form dictionary i entries Replace UWs with Hindi root words

Category of morpho-syntactic ti properties which distinguish the various relations that a noun n phrase may bear to a governing head. न, पर,क, स, प,etc. A rule base based on : UNL attributes lexical attributes from dictionary Vishal Vachhani 33

Case marking is implemented using rules. We analyze all UNL as well as dictionary attributes and decide next and previous case marker. Also we use relation with parent to extract the right case mark. Vishal Vachhani 34

agt:null:null:null:न :@past#v:vint:n:null agt:null:null:null:न:@past#v:vint:n:null Structure relname : parent previous case marker: parent next case marker: child previous case marker: child next case marker: the rest four are in form of attr'rel'relationname ti and attr will be separated by # also relation name are separated by # Vishal Vachhani 35

What is Morphology Study of Morphemes Their formation into words, including inflection, derivation and composition Vishal Vachhani 36

Noun, Verb and Adjective Morphology Depends on the phonetic properties of the Hindi word Noun Morphology Depends on gender, number and vowel ending of the noun Adjective Morphology अ छ लडक, अ छ लडक, अ छ लडक adjective अ छ changes, lexical l attribute t AdjA Verb Morphology Depends upon tense, gender, number, person etc. Vishal Vachhani 37

Verbs are categorized by Tense (past,present,future) Gender(male,female) Person (1 st, 2 nd, 3 rd ) Number (sg,pl) Example Ladaka khana kha raha hai. It contains present continuous tense,male, sg, and 3 rd person Vishal Vachhani 38

Arranging word according to the language structure Rule based module It is priority based graph traversal Vishal Vachhani 39

Algorithm for Syntax Planning: 1) Start traversing the UNL graph from the entry node. 2) If node has no children then add this node to final string. 3) If there is more than one child hldof one node then sort children hld based on the priority of the relations. Relation having highest priority will be traversed first. 4) Mark that node as visited node. 5) Repeat steps 3 and 4 until all the children of that node get visited. i 6) If all the children of that node get visited then add that node to final string. 7) Repeat steps 2 to 4 until all the nodes get traversed. Vishal Vachhani 40

Also, spray 5% Neemark solution. obj spray U 3 man obj:17 man:9 mod:5 qua:5 solution also mod mod percent Neemark qua 5 41 Vishal Vachhani

Entry spray Vishal Vachhani 42

Entry spray obj man Vishal Vachhani 43

Entry spray obj:17 man:9 Vishal Vachhani 44

Entry spray obj:17 man:9 solution Vishal Vachhani 45

Entry spray obj:17 man:9 solution mod mod Vishal Vachhani 46

Entry spray obj:17 man:9 solution mod:5 mod:5 Vishal Vachhani 47

Entry spray obj:17 man:9 solution mod:5 mod:5 percent Vishal Vachhani 48

Entry spray obj:17 man:9 solution mod:5 mod:5 percent Vishal Vachhani 49

Entry spray obj:17 man:9 solution mod:5 mod:5 qua:5 percent Vishal Vachhani 50

Entry spray obj:17 man:9 solution mod:5 mod:5 Output : 5 percent qua:5 5 Vishal Vachhani 51

Entry spray obj:17 man:9 solution mod:5 mod:5 percent qua:5 5 Output : 5 percent Vishal Vachhani 52

Entry spray obj:17 man:9 solution mod:5 mod:5 percent qua:5 5 Neemark Output : 5 percent Neemark Vishal Vachhani 53

Entry spray obj:17 man:9 solution mod:5 mod:5 percent qua:5 5 Neemark Output : 5 percent Neemark solution Vishal Vachhani 54

Entry spray obj:17 man:9 solution also mod:5 mod:5 qua:5 percent 5 Neemark Output : 5 percent Neemark Solution also Vishal Vachhani 55

Entry spray obj:17 man:9 solution also mod:5 mod:5 percent qua:5 5 Neemark Output : 5 percent Neemark Solution o also spray Vishal Vachhani 56

Output: 5 percent Neemark solution also spray 5 तशत न मअक घ ल भ छड़क 5 तशत न मअक घ ल भ छड़क Vishal Vachhani 57

Input sentence: Its roots are affected by bacterial infection. Module Input Output Its roots are affected by bacterial infection. UNL parser ज भ वत ज व वक स मण Case marking Morphology Syntax Planning ज भ वत ज व वक स मण स इसक जड़ ज व वक भ वत ह त ह स मण स ज व वक स मण स इसक जड़ भ वत ह त ह Output: ज व वक स मण स इसक जड़ भ वत ह त ह Vishal Vachhani 58

UNL 2005 Specifications: http://www.undl.org/unlsys/unl/unl2005/ S.Singh, M.Dalal, V.Vachhani, P.Bhattacharrya and O.Damani Hindi generation from interlingua MTsummit 2007 (www.cse.iitb.ac.in/~vishalv) in/~vishalv) Mrugank Surve, Sarvjeet Singh, Satish Kagathara, Venkatasivaramasastry K, Sunil Dubey, Gajanan Rane, Jaya Saraswati, Salil Badodekar, Akshay Iyer, Ashish Almeida, Roopali Nikam, Carolina Gallardo Perez, Pushpak Bhattacharyya, AgroExplorer Group: AgroExplorer: a Meaning Based Multilingual Search Engine, International Conference on Digital Libraries (ICDL), New Delhi, India, Feb 2004. Agro Explorer : http://agro.mlasia.iitb.ac.in aaqua : http://www.aaqua.org Vishal Vachhani 59