UNIVERSITI PUTRA MALAYSIA QURANIC ONTOLOGY FOR RESOLVING QUERY TRANSLATION DISAMBIGUATION IN ENGLISH-MALAY CROSS-LANGUAGE INFORMATION RETRIEVAL

Similar documents
EVALUATION USABILITY MEASUREMENT INDEX FOR HIGHER EDUCATION INSTITUTE MUHAMMAD ALIIF BIN AHMAD

FEAR OF CRIME WITHIN NON-GATED RESIDENTIAL COMMUNITIES IN THE URBAN CONTEXT SITI AISHAH BINTI AHMAD KAMIL

UNIVERSITI PUTRA MALAYSIA

MOSAICKING OF TORN IMAGE USING GRAPH ALGORITHM AND COLOR PIXEL MATCHING IBRAHIM THORIG

THE EFFECTS OF RISK MANAGEMENT PRACTICES, RISK BEHAVIOUR ON RESEARCH AND DEVELOPMENT PROJECT PERFORMANCE IN UTM NOR ALIAA BINTI ZAINAL ABIDIN

MOLECULAR PHYLOGENY OF SELECTED MANGO CULTIVARS BASED ON INTERNAL TRANSCRIBED SPACER (ITS) REGION SHAHKILA MOHD ARIF

AN INVESTIGATION ON VEHICLE OVERLOADING IN MUAR MELAKA ROAD HAZLINA BINTI MARWAN

A ROBUST ESTIMATION METHOD OF LOCATION AND SCALE WITH APPLICATION IN MONITORING PROCESS VARIABILITY ROHAYU BT MOHD SALLEH

FLOW IN A PIPELINE WITH LEAKAGE SITI NUR HASEELA BINTI IZANI

IMPROVING ENERGY SAVING EVALUATION IN LIGHTING USING DAYLIGHT UTILIZATION WITH AREA SEGREGATION TECHNIQUE MOHAMMAD ASIF UL HAQ

MATROID STRUCTURE OF DYNAMIC GRAPH MODEL OF EVAPORATION PROCESS IN A BOILER SYSTEM NUR SYAHIDAH BINTI KHAMIS UNIVERSITI TEKNOLOGI MALAYSIA

UNIVERSITI PUTRA MALAYSIA

UNIVERSITI PUTRA MALAYSIA EFFECTS OF HIJAB AS ISLAMIC RELIGIOUS SYMBOL ON MUSLIM CONSUMER ATTITUDES TOWARDS ADVERTISEMENTS

FACTORS THAT AFFECT KNOWLEDGE SHARING AMONG EMPLOYEES IN MULTINATIONAL ORGANIZATION YASER HASSAN HASSAN AL-QADHI UNIVERSITI TEKNOLOGI MALAYSIA

MANAGEMENT OF VARIATION ORDER IN PUBLIC WORKS DEPARTMENT MALAYSIA CONSTRUCTION PROJECT SHARIL AMRAN BIN AMIR MOHAMED

UNIVERSITI TEKNOLOGI MALAYSIA

FORECASTING REVENUE PASSENGER ENPLANEMENTS USING WAVELET-SUPPORT VECTOR MACHINE MOHAMAD AIMAN ZAINUDDIN

UNIVERSITI PUTRA MALAYSIA APPLICATION OF HUMAN GOVERNANCE BY SYARIAH COMMITTEE IN DETERMINING THE ISLAMICITY OF BANKING PRODUCTS

COMMON CONTRACTUAL ISSUES FACED BY MALAYSIAN CONTRACTORS OPERATING IN MIDDLE EAST USING FIDIC FORM OF CONTRACTS

THE PREVAILING PRACTICE IN DECIDING THE PRACTICAL COMPLETION OF CONSTRUCTION WORK. MOHAMMAD HARITH BIN MOHD YUNOS

UTILITY CONSUMPTION PATTERN AMONG MALAYSIAN ELECTRICITY USERS NURHIDAYAH BT MAHUSIN UNIVERSITI TEKNOLOGI MALAYSIA

MODELLING AND VIBRATION CONTROL OF PIEZOELECTRIC ACTUATOR

UNIVERSITI PUTRA MALAYSIA EAST-WEST DIALOGUE ON JALALUDDIN RUMI AND RALPH WALDO EMERSON IN RELATION TO MYSTICISM HOSSEINALI LIVANI

Universiti Teknologi MARA. Ontology of Social Interaction Ethics in Al Adab Al - Mufrad by Using Semantic Web

UNIVERSITI PUTRA MALAYSIA NUMERICAL PERFORMANCE EVALUATION OF PLAIN FIN TUBEHEAT EXCHANGER UNDER FROSTY CONDITIONS

THE EFFECTS OF TAPERED SLEEVE IN IMPROVING THE ANCHORAGE BOND OF REINFORCEMENT BAR PAMELA ADELINE LO

TINDAKAN PIHAK BERKUASA NEGERI DAN PIHAK BERKUASA TEMPATAN TERHADAP KES PELANGGARAN SYARAT GUNA TANAH

MALAYSIAN SOUVENIRS AND TOURISTS BEHAVIOUR TOWARD AUTHENTICITY AS AN IMPORTANT SOUVENIR ATTRIBUTE MUHAMMAD IRFAN BIN ZAWAWI

MESHFREE FORMULATION FOR BUCKLING OF COMPOSITE BEAM WITH SLIP MOHD HAMIDI BIN HARUN UNIVERSITI TEKNOLOGI MALAYSIA

RADON AND THORON STUDY IN AREAS OF ELEVATED BACKGROUND RADIATION IN PALONG, SEGAMAT, JOHOR NUR AMIRA BINTI ABD WAHAB UNIVERSITI TEKNOLOGI MALAYSIA

SEISMIC AND PROGRESSIVE COLLAPSE ASSESSMENT OF NEW PROPOSED STEEL CONNECTION IMAN FARIDMEHR

REVERSE ENGINEERING OF AN ERGONOMIC OFFICE CHAIR

UNIVERSITI PUTRA MALAYSIA WESTERN EXISTENTIALISM IN SELECTED TRANSLATED MALAY ABSURD PLAYS FROM AN ISLAMIC PERSPECTIVE COPYRIGHT UPM

Universiti Teknologi MARA. Zakat Calculation System for Academy of Contemporary Islamic Studies (ACIS), UiTM Melaka Campus Jasin

TERMINATION OF CONTRACT: ABANDONMENT OF WORK MOHD NUR IMAN AL HAFIZ BIN MOHD JAMIL

METAPHOR ANALYSIS OF DR. MAHATHIR S BUSINESS SPEECHES ALIAKBAR IMANI

MUD FLOOD AS A SUITABLE MATERIAL FOR SUBGRADE LAYER OF LOW TRAFFIC VOLUME NURUL AIN BINT! IBRAHIM

THE EFFECTS OF INFLUENTIAL BEHAVIOURAL FACTORS ON INVESTORS DECISION MAKING IN STOCK MARKET OF PAKISTAN MISBAH SADIQ

POWER QUALITY IMPROVEMENT BY DYNAMIC VOLTAGE RESTORER AND UNIFIED POWER QUALITY CONDITIONER USING FUZZY LOGIC FARIDULLAH KAKAR

SHUNT ACTIVE POWER FILTER OPERATING WITH A MULTI-VARIABLE FILTER AND NEW REFERENCE CURRENT GENERATION FOR HARMONICS AND REACTIVE POWER COMPENSATION

THE REPRESENTATION OF THE POOR PEOPLE: THE REFLECTION OF VICTORIAN ERA POVERTY IN DICKENS S GREAT EXPECTATIONS A THESIS

A NOVEL MAGNETORHEOLOGICAL VALVE WITH MEANDERING FLOW PATH STRUCTURE FITRIAN IMADUDDIN UNIVERSITI TEKNOLOGI MALAYSIA

UNIVERSITI TEKNOLOGI MALAYSIA

ABSTRACT Muslim youth face many challenges today due to the huge scientific development. New information technologies can be considered one of the mos

THERMOLUMINESCENCE PROPERTIES OF DYSPROSIUM-DOPED CALCIUM BORATE GLASS FOR DOSE MEASUREMENT SUBJECTED TO IONIZING RADIATION

UNIVERSITI PUTRA MALAYSIA PENGGUNAAN LAMAN WEB ZAKAT LEMBAGA ZAKAT SELANGOR DAN PEMBAYARAN ZAKAT SECARA ATAS TALIAN

SULIT P2115-EKONOMI DARI PERSPEKTIF ISLAM/JAN 08

(The rise of al-ahbash movement and Its Impact in Malaysia) Faculty of Islamic Civilization, Universiti Teknologi Malaysia

وظاي ف الدولة الا سلامية (The Duties And Function Of An Islamic State)

(The Human Soul Based on the Opinion of Fakhr al-din al-razi) ELBAHLOUL MOHAMED HUSSEIN* MOHD NASIR OMAR AHMAD SUNAWARI BINLONG MUDASIR BIN ROSDER

INTERCULTURAL CONFLICT REFLECTED IN JULES VERNE S AROUND THE WORLD IN 80 DAYS NOVEL (1873): A SOCIOLOGICAL APPROACH

TRANSLATION SHIFT OF NOUN PHRASE IN TROLLS MOVIE AND ITS SUBTITLING RESEARCH PAPER

SPM4342 PEMBANGUNAN SISTEM PEMBELAJARAN BERASASKAN WEB PRINSIP ASAS MEREKA BENTUK WEB

NOTA 5: PANDUAN MENGHASILKAN LAMAN (SITES) 1.1 Pengenalan

AN ANALYSIS OF TRANSITIVITY PROCESSES OF INAUGURATION SPEECHES OF TWO PRIME MINISTERS OF AUSTRALIA JOHN

BIOMIMETIC PATTERN RECOGNITION FOR WRITER IDENTIFICATION USING GEOMETRICAL MOMENT FUNCTIONS

A Study of Language Maintenance in Mixed Marriage. Bataknese and Javanese Families in Semarang and Tegal

UNIVERSITY OF NORTH SUMATERA FACULTY OF CULTURE STUDIES DIPLOMA III ENGLISH STUDY PROGRAM MEDAN JUNE 2011

A Study of the. Waiters. Customers A THESIS SEMARANG

DOSIMETRIC PROPERTIES OF LITHIUM MAGNESIUM BORATE GLASSES DOPED WITH DYSPROSIUM AND PHOSPHORUS OXIDE FOR RADIATION DOSE MEASUREMENT

CHARACTER PORTRAYAL IN F. SCOTT FITZGERALD S

AN ANALYSIS OF IMPLICATURE IN THE NEVERENDING STORY A FILM SCRIPT BY MICHAEL ENDE

HAK MILIK PmAT mrenajlr.mi mm.u. sum Jl. ! l1hat Sebelah. 'Pe l) tesis

HBT 503 SEMINAR SISWAZAH: ISU-ISU PENTERJEMAHAN

SKRIPSI DEIXIS USED IN ENGLISH TRANSLATION OF SURAH YUSUF

CLASSROOM TECHNIQUES USED TO IMPROVE STUDENTS SPEAKING SKILL: A NATURALISTIC STUDY AT ENGLISH TUTORIAL PROGRAM AT UNIVERSITAS MUHAMMADIYAH SURAKARTA

Key Words: Lexical materials - Malay Dictionaries - Evaluation - Dictionary Progress - Influence of Dictionaries

AN ANALYSIS OF ILLOCUTIONARY ACTS IN DISCOVER MAGAZINE

THE DESCRIPTION OF LOVE BETWEEN CANCER VICTIMS IN JOHN

THE APPLICATION OF SPEECH ACT IN THE INTERVIEW OF EMMA WATSON WITH DAVID LETTERMAN IN THE LATE SHOW TERM PAPER. By: Talitha Umaya

THE TRANSLATION OF IMPLICIT MEANING IN ELDEST NOVEL

SCHOOL OF PHYSICS LOGO DESIGN CONTEST

MARA UNIVERSITY OF TECHNOLOGY

MANUAL PENGGUNA PENERIMAAN BARANG(ASET/INVENTORI) MELALUI NOTA TERIMAAN BARANG (GRN) MENGGUNAKAN APLIKASI:-

Abstract of thesis presented to the Senate of the Universiti Putra Malaysia in fulfillment of the requirement for the degree of Doctor of Philosophy

AN ANALYSIS OF FEMINISM AS REFLECTED IN LOUISA MAY

DIPLOMA III ENGLISH STUDY PROGRAM FACULTY OF CULTURE STUDIES UNIVERSITY OF NORTH SUMATERA MEDAN JUNE 2011

WOMAN S PASSIONS IN ELIZABETH BARRETT BROWNING S POEMS ENTITLED THE LADY S YES AND HOW DO I LOVE THEE?

METAPHORS CORRESPONDENCES OF SOURCE AND TARGET DOMAIN ON THE GOSPEL OF JOHN

The Representation of Greek Mythological Hero in Mike Banning in. Antonie Fuqua s Olympus has Fallen

HUBUNGAN ANTARA GANJARAN DAN ETIKA KERJA ISLAM DENGAN KOMITMEN DALAM KALANGAN PEKERJA DI JABATAN PEMBANGUNAN PERSEKUTUAN KELANTAN

MATERIALISM IN GEORGE ELIOT S NOVEL SILAS

MEASURING THE TOTAL QUALITY MANAGEMENT IN THE INDONESIAN UNIVERSITIES: FROM THE PERSPECTIVES OF FACULTY MEMBERS THESIS

PERATURAN-PERATURAN PERUBATAN (MENETAPKAN PEPERIKSAAN BAGI PENDAFTARAN SEMENTARA) 2015

UNIVERSITI TEKNOLOGI MARA PROPOSING A NON-MONETARY ISLAMIC INDEX FOR POVERTY MEASUREMENT AT LEMBAGA ZAKAT SELANGOR (LZS), MALAYSIA

ANARCHISM MOVEMENT AS DEPICTED IN ERIN GRUWELL S THE FREEDOM WRITERS DIARY

COMMUNICATION STRATEGIES USED BY THE YEAR STUDENTS OF ENGLISH DEPARTMENT AT THE NORTH SUMATRA UNIVERSITY: A CASE STUDY

COMPLAINT RESPONSES USED BY INDONESIAN EFL LEARNERS

MURDER REFLECTED IN ROBERT GALBRAITH S THE CUCKOO S CALLING (2013): A PSYCHOANALYTIC APPROACH

PERBEZAAN BARIS I RĀB DAN KESANNYA DALAM QIRA AT: SATU KAJIAN DALAM SURAH AL-BAQARAH

Latihan MyMesyuarat -PENGERUSI- DibentangkanOleh

NATIVE AMERICAN IDENTITY CONSTRUCTION AS PORTRAYED IN THE NOVEL THE ABSOLUTELY TRUE DIARY OF A PART-TIME INDIAN BY SHERMAN ALEXIE THESIS

HENDRA SAPUTRA UNIVERSITI TEKNOLOGI MALAYSIA

Pengelasan Bahasa Kesat Menggunakan Pemberat Istilah Sebagai Pemilihan Ciri Bagi Kandungan Laman Web

UNIVERSITI PUTRA MALAYSIA TOWARDS AN ISLAMIC PARADIGM OF THE INFORMATON SOCIETY ABBAS GHANBARI BAGHESTAN FBMK

DOKUMEN TIDAK TERKAWAL

SYARAT KEMASUKAN PROGRAM DIPLOMA UTM UNIVERSITI TEKNOLOGI MALAYSIA (UTM)

Kajian Rintis Penerimaan Mualaf Di Selangor Terhadap Aplikasi Smartsolat

THE DESCRIPTION BETWEEN HUMBERT AND LOLITA S LOVE IN VLADIMIR NABOKOV S NOVEL LOLITA

KECENDERUNGAN PELAJAR DAN MASYARAKAT ISLAM TERHADAP PENUBUHAN BANK WAKAF

Transcription:

UNIVERSITI PUTRA MALAYSIA QURANIC ONTOLOGY FOR RESOLVING QUERY TRANSLATION DISAMBIGUATION IN ENGLISH-MALAY CROSS-LANGUAGE INFORMATION RETRIEVAL ZULAINI BINTI YAHYA FSKTM 2012 27

QURANIC ONTOLOGY FOR RESOLVING QUERY TRANSLATION DISAMBIGUATION IN ENGLISH-MALAY CROSS-LANGUAGE INFORMATION RETRIEVAL By ZULAINI BINTI YAHYA Thesis Submitted to the School of Graduate Studies, Universiti Putra Malaysia, in Fulfilment of the Requirements for the Degree of Master of Science November 2012

Abstract of thesis presented to the Senate of in fulfilment of the requirement for the degree of Master of Science QURANIC ONTOLOGY FOR RESOLVING QUERY TRANSLATION DISAMBIGUATION IN ENGLISH-MALAY CROSS-LANGUAGE INFORMATION RETRIEVAL By ZULAINI BINTI YAHYA November 2012 Chairman: Muhamad Taufik bin Abdullah, PhD Faculty: Computer Science and Information Technology This research proposed a Cross Language Information Retrieval (CLIR) method based on specific domain/ontology using specific concepts for disambiguating translation of the query. This research experiment the use of specific domain/ontology: Quran, written in English and Malay languages as a bilingual parallel-corpora and specific concepts: Quran, as a resource for cross-language query translation along with dictionary-based translation. This study evaluates the effectiveness of query translation using dictionarybased and ontology for CLIR system. For translation, we use two basic approaches as benchmark: 1) first translation listed in the dictionary; and 2) all translation candidates listed in the dictionary. For the proposed CLIR method, we use three approaches: 1) based on verse list; 2) based on concepts similarity; and 3) based on concepts expansion. For concepts i

matching before and after query translation, we used two approaches: 1) query concepts; and 2) translation concepts. The experimental result shows that retrieval performance using dictionarybased is lower than monolingual either in English or Malay document collections. Direct translation involved in returning many possibility results which can affect the decreasing in document retrieval performance either in English or Malay document collections. For the proposed CLIR method, performance of CLIR query translation based on verse list approach, concepts similarity approach and concepts expansion approach, obtained a better result either using query concepts or translation concepts matching compared to dictionary-based for English document collections but not in Malay document collections. In Malay document collections the retrieval performance only improved in concepts expansion approach. English language has a better structure compared to Malay language which affects the retrieval performance. A single Malay word may have a variety of meaning, not only by the word itself but also depends on the meaning of the verse or chapter. This is one of the reasons why retrieval performance decreasing in Malay document collections. ii

Abstrak tesis yang dikemukakan kepada Senat sebagai memenuhi keperluan untuk ijazah Master Sains ONTOLOGI QURAN UNTUK MENYELESAIKAN PENYAHTAKSAAN TERJEMAHAN PERTANYAAN DALAM DAPATAN SEMULA MAKLUMAT SILANG BAHASA INGGERIS-MELAYU Oleh ZULAINI BINTI YAHYA November 2012 Pengerusi: Muhamad Taufik bin Abdullah, PhD Fakulti: Sains Komputer dan Teknologi Maklumat Kajian ini mencadangkan kaedah Dapatan Semula Maklumat Silang Bahasa (DSMSB) berdasarkan domain/ontologi khusus dengan menggunakan konsep khusus untuk penyahtaksaan terjemahan pertanyaan. Kajian ini menggunakan domain/ontologi khusus: Al-Quran yang ditulis dalam bahasa Inggeris dan bahasa Melayu sebagai korpusselari dwibahasa, kamus dwibahasa dan konsep khusus: Quran yang ditulis dalam bahasa Inggeris dan bahasa Melayu sebagai sumber untuk merentas terjemahan pertanyaan. Kajian ini menilai keberkesanan terjemahan pertanyaan dengan menggunakan kamus dwibahasa dan ontologi untuk sistem DSMSB. Untuk terjemahan, kami menggunakan dua pendekatan sebagai penanda aras, iaitu: 1) terjemahan pertama yang tersenarai dalam kamus; dan 2) semua terjemahan yang tersenarai dalam kamus. Dalam kaedah DSMSB iii

yang dicadangkan, kami menggunakan tiga pendekatan iaitu: 1) berdasarkan senarai surah; 2) berdasarkan persamaan konsep; dan 3) berdasarkan pengembangan konsep. Untuk penggunaan padanan konsep sebelum dan selepas terjemahan pertanyaan, kami menggunakan dua pendekatan, iaitu: 1) konsep pertanyaan; dan 2) konsep terjemahan. Hasil kajian menunjukkan prestasi terjemahan pertanyaan menggunakan kamus dwibahasa pada sistem DSMSB lebih rendah berbanding dengan dapatan semula maklumat satu bahasa. Terjemahan secara langsung menghasilkan pelbagai kemungkinan jawapan yang menyebabkan penurunan prestasi bagi koleksi Inggeris ataupun Melayu. Bagi pendekatan DSMSB cadangan, prestasi terjemahan pertanyaan menggunakan pendekatan berdasarkan senarai surah, berdasarkan persamaan konsep and berdasarkan pengembangan konsep, mendapat keputusan yang lebih baik sama ada dengan menggunakan konsep pertanyaan atau konsep terjemahan berbanding dengan kamus bagi koleksi dokumen Inggeris tetapi tidak dalam koleksi dokumen Melayu. Prestasi terjemahan pertanyaan hanya baik dengan menggunakan pendekatan berdasarkan pengembangan konsep bagi koleksi dokumen Melayu. Bahasa Inggeris mempunyai struktur yang lebih baik berbanding dengan bahasa Melayu yang mana ia memberi kesan kepada prestasi terjemahan pertanyaan. Satu perkataan bahasa Melayu boleh mempunyai pelbagai makna, bukan sahaja dari perkataan itu sendiri, tetapi juga iv

bergantung kepada makna ayat atau bab. Inilah satu sebab mengapa prestasi dapatan semula maklumat menurun dalam koleksi dokumen Melayu. v

ACKNOWLEDGEMENTS Alhamdulillah, praise to Allah Almighty. With His gift and permission, I was given the strength and perseverance in completing this thesis as fulfillment of the assignment for the degree of Master of Science (Information Retrieval) successfully. However, this success could not have achieved without the guidance, assistance, cooperation, support, and encouragement of certain people. I would like to extend my appreciation and gratitude to the institutions and individuals who have jointly helped me to achieve this success. Foremost, I would like to thank my supervisor Dr. Muhamad Taufik bin Abdullah, who shared with me a lot of his time, expertise and research insight. His faithful encouragements, keen insight, worthy guidance, and valuable suggestions throughout the academic period have helped me immensely to achieve success in both my research and completing this thesis. I am also grateful to my committee members, Dr. Azreen bin Azman and Dr. Rabiah Binti Abdul Kadir, for their helpful suggestions and comments on my research. I also acknowledge my lecturer, Assoc. Prof. Hj. Mohd. Hasan bin Selamat, who shared with me an ideas, knowledge and experience in understanding the research method. My thanks are also extended to all colleagues in the Faculty of Computer Science and Information Technology at Universiti Putra Malaysia, for their sincere cooperation by spending time, sharing knowledge and giving ideas in understanding and completing the requirements of the compulsory subjects in my studies. Acknowledgements also directed to Dato' Hj. Termuzi bin Hj. Abdul Aziz, Director General of Institute of Language and Literature, Mr. Kamarul Zaman bin Shaharudin, deputy director general of Institute of Language and Literature for their permission and financial support. Special thanks also given to Mr. Sulaiman bin Kaiat, head of Information System Department for giving me an opportunity to attend this program. My thanks are extended to vi

all members of Information System Department for their support, enthusiasm and guidance. Finally, and the most importantly, infinite thank are given to my parents, Hj. Yahya bin Ismail and Hjh. Puteh binti Jusoh, for their love, guidance and their prayer are always on their lips and hearts. Sincere thanks are given to my brothers and sisters, for their love and understanding. Their presence and support most deeply felt and appreciated, makes my life more meaningful. To them I dedicate this thesis. Thank you Allah vii

I certify that a Thesis Examination Committee has met on 22 November 2012 to conduct the final examination of Zulaini binti Yahya on her thesis entitled Quranic Ontology for Resolving Query Translation Disambiguation in English-Malay Cross-Language Information Retrieval in accordance with the Universities and University Colleges Act 1971 and the Constitution of the Universiti Putra malaysia [P.U.(A) 106] 15 March 1998. The Committee recommends that the student awarded the degree of Master of Science (Information Retrieval). Members of the Thesis Examination Committee were as follows: Rahmita Wirza O.K. Rahmat, PhD Associate Professor Faculty of Computer Science and Information Technology (Chairman) Aida binti Mustafa, PhD Faculty of Computer Science and Information Technology (Internal Examiner) Hamidah binti Ibrahim, PhD Professor Faculty of Computer Science and Information Technology (Internal Examiner) Zainab binti Abu Bakar, PhD Professor Faculty of Information Technology and Mathematical Sciences University Technology Mara (External Examiner) SEOW HENG FONG, PhD Professor and Deputy Dean School of Graduate Studies Date: viii

This thesis was submitted to the Senate of and has been accepted as fulfilment of the requirement for the degree of Master of Science (Information Retrieval). The members of Supervisory Committee were as follows: Muhamad Taufik bin Abdullah, PhD Senior Lecturer Faculty of Computer Science and Information Technology (Chairman) Azreen bin Azman, PhD Senior Lecturer Faculty of Computer Science and Information Technology (Member) Rabiah binti Abdul Kadir, PhD Senior Lecturer Faculty of Computer Science and Information Technology (Member) BUJANG BIN KIM HUAT, PhD Professor and Dean School of Graduate Studies Date: ix

DECLARATION I declare that the thesis is my original work except for quotations and citations which have been duly acknowledged. I also declare that it has not been previously, and is not concurrently, submitted for any other degree at or at any other institution. ZULAINI BINTI YAHYA Date: 22 November 2012 x

TABLE OF CONTENTS ABSTRACT ABSTRAK ACKNOWLEDGEMENTS APPROVAL DECLARATION LIST OF TABLES LIST OF FIGURES LIST OF ABBREVIATIONS CHAPTER Page i iii vi viii x xiv xv xvii 1 INTRODUCTION 1 1.1. Background 1 1.2. Problem Statement 5 1.3. Research Objectives 7 1.4. Research Scope 8 1.5. Research Assumptions 9 1.6. Research Contribution 9 1.7. Overview of Thesis 10 2 LITERATURE REVIEW 12 2.1. Introduction 12 2.2. Information Retrieval 12 2.3. Cross Language Information Retrieval 17 2.4. Domain of Knowledge 25 2.5. Quran Ontology 30 2.6. Summary 33 3 RESEARCH METHODOLOGY 35 3.1. Introduction 35 3.2. Research Orientation 35 3.3. Stage 1: Literature Review 37 3.3.1. Problem Statements 37 3.3.2. Reviews Papers 37 xi

3.4. Stage 2: Methods and Strategies 38 3.4.1. Concepts, Idea and Strategies 38 3.4.2. Architecture Design 40 3.5. Stage 3: Implementation 43 3.5.1. Data Set 44 3.5.2. Evaluation Metrics 44 3.5.3. Experimental Design 45 3.5.4. Experimental Procedure 47 3.5.5. Method of Experiment 48 3.6. Stage 4: Evaluation 50 3.7. Summary 50 4 PROPOSED CLIR METHOD 52 4.1. Introduction 52 4.2. Quran Ontology and Quran Concepts 52 4.3. An Approach for Document Classification 55 4.3.1. Based on Verse List 56 4.3.2. Based on Concepts Similarity 57 4.3.3. Based on Concepts Expansion 59 4.4. An Approach for Query Translation 60 4.5. An Approach for Concepts Matching 63 4.5.1. Query Concepts Matching 63 4.5.2. Translation Concepts Matching 66 4.6. Summary 70 5 RESULTS AND DISCUSSION 71 5.1. Introduction 71 5.2. Data Analysis 72 5.2.1. Quran Concepts 73 5.2.2. Quran Ontology 73 5.2.3. Quran Document 73 5.3. Experimentation 74 5.4. Experimental Results for Mono IR and Dictionary- Based CLIR 5.4.1. Experiment with English Document 75 75 xii

5.4.2. Experiment with Malay Document 5.4.3. Discussion 79 5.5. Experimental Results Based on Verse List 80 5.5.1. Experiment with English Document 5.5.2. Experiment with Malay Document 5.5.3. Discussion 84 5.6. Experimental Results Based on Concepts Similarity 87 5.6.1. Experiment with English Document 5.6.2. Experiment with Malay Document 5.6.3. Discussion 91 5.7. Experimental Results Based on Concepts Expansion 5.7.1. Experiment using English Document 5.7.2. Experiment using Malay Document 5.7.3. Discussion 98 5.8. Summary 101 6 CONCLUSION 103 6.1. Introduction 103 6.2. Conclusion 103 6.3. Future Work 105 6.4. Summary 106 REFERENCES 107 APPENDICES 114 BIODATA OF STUDENT 162 LIST OF PUBLICATIONS 163 77 80 82 87 89 94 94 96 xiii