2012 International Conference on Innovations in Information Technology (IIT) A Quranic Quote Verification Algorithm for Verses Authentication Abdulrhman Alshareef 1,2, Abdulmotaleb El Saddik 1 1 Multimedia Communications Research Laboratory, EECS, University of Ottawa, Ottawa, Canada 2 Informtion System Department, FCIT, King Abdulaziz University, Jeddah, Saudi Arabia aalshareef@mcrlab.uottawa.ca, abed@mcrlab.uottawa.ca Abstract The growth of Quranic digital publishing increases the need to develop a better framework to authenticate Quranic quotes with the original source automatically. This paper aims to demonstrate the significance of the quote authentication approach. We propose an approach to verify the e-citation of the Quranic quote as compared with original texts from the Quran. In this paper, we will concentrate mainly on discussing the Algorithm to verify the fundamental text for Quranic quotes. Quote filtering; Quranic quote; Quote authenticity; Fundamental text; Text authenticity I. INTRODUCTION The Quran is the holy book for more than one billion Muslims around the world speaking different languages. Arabic is the native language of the Quran. Although, some people can read and understand the Quran in the native language, they are not able to translate it properly. Even though the number of the Quran's searching and translation websites is increasing [1], some of these websites are either incorrect or incomplete. Thus, the need for a reliable translation service has become an urgent necessity, especially with the diversity of Muslims' languages around the world. However, the translation of Quran is translated from the original Arabic text. Thus, if the original texts are corrupted then its translation will be incorrect. Consequently, the Quranic quote verification service became more important. Muslims depend on Quranic verses to support many decisions in their daily life. Decision-making is highly dependent on the authenticity of the verses quoted from the Quran. Some Muslims use Quranic verses to deduce solutions for their social and religious problems or to analyze some events. On the other hand, most of the Muslim authors quote from these verses as an evidence to support their conclusions and their analysis of events. This scientific method is popular in the Islamic societies in general and Arabs in particular [2]. Thus, Muslims' reliance on Quran makes it imperative for researchers to develop verification mechanisms to support Muslim's users. Ordinarily, readers with moderate religious knowledge cannot observe the authenticity of the quotes used by the author. However, to verify the quote authenticity, they have to make an extensive research in the original copy of the Quran. Obviously, this is a tedious task, especially when that the verses numbering is missing for the quotation. Comparing the quote with the original text and confirming the match emphasizes more the authenticity of the Quranic quote. This procedure can increase the user s confidence on those Quranic quotes. Previous related work includes the Quranic metacodification performed at the International Islamic University in Malaysia [3]. The authors have discussed the Quran's structure in terms of the number of verses in the chapter, the number of characters in the verses, and so on, using "Atomization Structure" and Unicode codification, which aims to protect the digital form of the Quran from corruption. Previous search methods are focused on the text information retrieval [1], semantic search [4], [5], grammar dependency [3]. However, none of the previous related works have taken into consideration the Quranic fundamental text authenticity. This work aims to verify the fundamental text for specific Quranic quotes which leads to improve the users confidence in the digital content. In this paper, we will illustrate our authentication model which consists of two stages to validate Quranic quotes in digital forms. II. MODEL OF QURANIC AUTHENTICATION In this paper we explain the architecture of our proposed Quranic quotes authentication framework. The Quranic quotes authentication is a framework that takes as an input a Quranic quote (i.e. complete or partial verse sentence) and outputs the quote authentication results as either genuine or incorrect. The framework architecture consists of two major components: the Quranic quote filtering and the verification mechanism. A. The Quranic Quote Filtering Algorithm The Quran uses the Arabic diacritics (Harakat) in addition to its unique special symbols. Fig.1 shows an example of a standard Quranic text with full diacritics and unique symbols placement. Those symbols and diacritics limit the capabilities of the traditional search engines to provide acceptable and accurate results to the users. 978-1-4673-1101-4/12/$31.00 2012 IEEE 339
TABLE II. Futtha Thummah Kusrah Sukoon EXAMPLE ILLUSTRATES THE ARABIC MAIN DIACRITICS. Shuddah Tenween Futtha Tenween Thummah Tenween Kusrah Figure 1. Example illustrates the organizational structure of Quranic verses as well as diacritics and unique symbols placement. Furthermore, the Arabic letter (Alif) is the only Arabic character that is written in four different structure using a special Arabic character (Hamza). These differences in the technique of drawing the character and adding diacritics reduce the possibility of obtaining accurate results for the intended text. Therefore, we will discuss three stages in the proposed algorithm that aim to overcome these obstacles as demonstrated in Fig.2 to overcome these obstacles. 1) Arabic Diacritics Removal: The Arabic language is rich in diacritics which are used to distinguish the vocal pronunciation. Arabic diacritics contain 8 main symbols that are shown in Table II [6]. As can be seen in Fig.1, the Quran applies those diacritics at each letter as mentioned previously.these diacritics constitute an impediment to retrieve data and information from the Quran, especially for traditional search engines. Hence, the first stage in the proposed algorithm aims to eliminate those diacritics (if any) from the target text [7]. 2) Special Quranic Symbols Removal: The Quran unique symbols are used to facilitate the Quran reading and understanding. They include signs related to where readers could or should stop or continue reading as presented in Table I. These symbols limit the ability of traditional search engines to deliver accurate results. Thus, the second stage in the proposed algorithm intends to remove these Quranic symbols (if any). 3) Standardization of the format of the letter " " : Alif is the first letter in the Arabic language. It is characterized by its own vocal character which is Hamza. The letter can be written in four different forms as follows. Although, each form have different vocal sound, still the need to minimize the input error and maximize the accurate outcome is essential. Users are not familiar with the grammatical rules to use the Hamza which is make it difficult for them to enter the text accurately. The objective of the third stage of the proposed algorithm is to convert the character drawing forms of the letter to a unified form which is [7]. This process minimizes the obstacles raised by the Arabic grammatical rules. 4) Quranic text filtering algorithm methodology: Fig.2 demonstrates the flow chart for The Quranic quote filtering algorithm. The text filtering algorithm starts by taking the input text. Then, the text goes into three filtering stages. At the first stage, the algorithm removes the Quranic diacritics. Later on, the algorithm removes any existing special Quranic symbols. Finally, the algorithm unifies the Alif forms to one form which is. Fig.3 shows the pseudo code that explains the underlying steps required to filter the Quranic text. The output text resulting from the filtering algorithm presents the filtered Quranic text. This text will be then transferred to the next component, The verification mechanism in the authentication system as an input, as described next. TABLE I. SOME EXAMPLE OF THE QURAN UNIQUE SYMBOLS. Continuing is better Must stop Stopping is better Must continue Figure 2. The Quranic quote filtering flowchart. 340
The Input: Entered Quote text The Output: The filtered Quote text 1 Initialize Letters Counter to zero. 2 Initialize String Quote Text ="" 3 Read the sentence. 4 Set the read sentence in Quote Text 5 6 Count the letters in Quote Text. 7 Set the letters counts in Letters Counter 8 9 IF the letters counter is greater than Zero. 10 FOR each letter in the sentence. 11 IF The letter contains a diacritic. 12 Remove the diacritics. 13 ENDIF 14 IF The letter contains a symbol. 15 Remove the symbol. 16 ENDIF 17 IF The letter equal " " or " " or " " 18 Replace the letter with. 19 ENDIF 20 Subtract one from Letters Counter. 21 END FOR 22 ELSE 23 RETURN the filtered sentence. cause of Allah which is a non-accurate meaning of the verse. However, Fig.6 shows the complete verse. This complete verse means Fight in the way of Allah against those who fight against you, but begin not hostilities. Lo! Allah loveth not aggressors which is the correct and accurate meaning of the verse. The search approach is based on the phrase-based matching pattern [8]. The pattern represents the process to match the complete sentence with each verse in the database. What distinguishes this pattern is that it returns data in case a match sentence is found in the database. In case no match sentence is found, it returns null. The developed system uses SQL language in order to implement the matching pattern. "LIKE" is the keyword used to implement the phrase-based matching pattern in associated with "WHERE" clause [9]. "LIKE" performs matching process based on "per-character" match methodology, therefore it can generate more accurate results than other possible comparison pattern [9], [10]. To perform the matching process an SQL query needs to be created. The matching query can be achieved using the verification query as shown below: SELECT AyahId, SurahName, AyahText, SurahId FROM Quran WHERE AyahText LIKE ' the filtered Quranic text ' Figure 3. Quranic text filtering pseudo code B. The Verification Mechanism The Quranic Quote Verification Mechanism is divided into two stages: the Quranic Quote Genuineness Validation and the Closest Similar Verse Retrieval (Fig.4). At the first stage the decision will be taken by the proposed algorithm to either continue to the next stage or display the result. As shown in Fig.4, the second stage depends on the query return data of the previous stage. 1) The Quranic Quote Genuineness Validation: At this stage, the system confirms the existence (or non existence) of the text in the Quran database without an alteration to the verse. Fig.1 Shows that each Quranic verse is literally independent from the previous verse and the following verse with numerical interval. Particularly, these intervals determine the beginning and the end of the verse. Hence, the authenticity of Quranic quote is based on those intervals. For example, if the quote contains two verses connected without an interval, this text will be incorrect. In addition, if the text does not contain the full text of the verse it will also be considered incorrect. Usually, if a verse is accidentally linked to a next or a previous verse or was incomplete, this will lead to a different meaning. As a result, difference in meaning would lead to wrong interpretation. For instance, Fig.5 shows an incomplete Quranic verse. This incomplete verse means Fight in the Figure 4. The authentication mechanism model 341
Figure 5. Example illustrates an incomplete Quranic verses. some of the system functionalities over multiple layers. For instance, the verification service is executed on a layer while the filtering algorithm is executed on another layer. This appropriate approach facilitates the integration of the required services according to the program s strategy. Besides, it allows development modifications and even coding radical s changes without altering the entire program [37]. Figure 6. Example illustrates a complete Quranic verses. If the output of the query returns a match, the input Quranic quote is considered genuine. Otherwise if it returns nothing, the Quranic text will be transferred to the next stage in the verification mechanism to retrieve the nearest similar verse. 2) Nearest Similar Verse Retrieval: At this stage, the system will make an extensive search to locate the nearest similar verse to the filtered input quote. The system will retrieve the Quranic verses with the slightest differences when compared to the input quote. In order to do that, the proposed query is processed based on exploiting the regular expression character. In essence, the regular expression search performs the data search query more accuratly based on the regular expression patterns [11]. A regular expression occupies powerful meta-characters that can be used to indicate retrieval patterns with high precision [12]. This can be achieved using " REGEXP search function" as shown below [9]: SELECT AyahId, SurahName, AyahText, SurahId FROM Quran WHERE AyahText REGEXP '[[:<:]]FQ [[:>:]]' OR AyahText REGEXP '[[:<:]]..FQ [[:>:]]' OR AyahText REGEXP '[[:<:]]...FQ [[:>:]]' OR AyahText REGEXP '[[:<:]]FQ..[[:>:]]' OR AyahText REGEXP '[[:<:]]FQ...[[:>:]]' OR AyahText REGEXP '[[:<:]]FQ...[[:>:]]' OR AyahText REGEXP '[[:<:]]..FQ...[[:>:]]' Whereas, FQ represents the filtered Quranic text. [[:<:]] [[:>:]] characters are set as text boundaries [9]; they match the beginning and the end of texts. Each two dots.. represents one Arabic character. Since the Arabic is the Quran language, Arabic language is characterized by using pronouns as suffixes and prepositions, and conjunctions as prefixes [13]. The two dots.. character are used to overcome the suffix and prefix difficulties. If a match is obtained, the system displays a list of similar verses including full text, chapter name (surah), and verse number. Otherwise, the input Quranic quote will be considered distorted. III. IMPLEMENTATION The proposed system has been implemented using Visual Studio 2010 (C#) on a Windows 7 platform. For the Quranic quote filtering algorithm, the Microsoft DOT NET Library was used to implement the filtering algorithm. For the verification mechanism, the MySQL Data Library was used to implement the SQL search statements. While implementing the system, we have adopted a Service-Oriented Architecture (SOA) approach by executing IV. RESULTS Table III and Table IV show the comparison results between the proposed searching algorithms and the top three traditional Quran search engines by Google [14 16]. The comparison in Table III was performed using 8 Quranic phrases. Each phrase was tested including the complete diacritics and Quran s symbols. The results show that the proposed algorithm retrieves a match. On the contrary, the traditional Quran search engines retrieve no results. This is due to the fact that the proposed algorithm considers the special Quran s symbols as described above unlike the traditional search engines. The comparison in Table IV was performed using 7 random Arabic words. The results show that the proposed algorithm has higher accuracy percentage based on the correct results retrieved compared to the traditional Quran search engines. The reason is that the proposed algorithm considers regular expression patterns to overcome the suffix and prefix issues which are not properly considered by traditional search engines. The algorithm accuracy (A Alg. ) will be determined based on the following equation: CR AR IR A Alg. = - AR TR Where CR is the correct results retrieved, AR is the accepted results, IR is the incorrect results retrieved and TR is the total results retrieved. The accepted result is the accurate results which are proved through an expert human reviewer in our tests. As can be seen in Table III and Table IV, there is a significant increase in the accuracy using the proposed algorithm when compared with the results of the traditional Quran search engines. TABLE III. Tested Quotes THE COMPARISON RESULTS FOR THE PHRASES. Proposed Algorithm [14] [15] [16] Y N N N Y N N N Y N N N Y N N N Y N N N Y N N N Y N N N Y N N N 342
TABLE IV. Tested Words Accepted Results 38 90 10 40 79 21 Over All THE COMPARISON RESULTS FOR THE WORDS. Proposed Algorithm 48 (79%) 99 (91%) 13 (77%) 37 (93%) 79 (100%) 25 (93%) 16 (76%) [14] [15] [16] 67 (57%) 44 68 241 (33%) 68 82 (96%) 305 89.8% 55.8% 43.2% 41.7% V. CONCLUSION In this paper, we have elaborated the details of an algorithm that enables the verification of the Quranic quotes in any system, application, or platform. The algorithm implementation helps the users to verify the Quranic e-contents over the Internet, which increases the confidence in the Quranic e- citation. The proposed algorithm is based on confirming the authenticity of the Quranic quote based on a comprehensive understanding of the Arabic language characteristics and the unique writing technique of the Quran. However, such system needs to consider a credibility criteria of an information retrieval system such as availability, reliability, constantly and integrational. Hence, the next phase of this project is to develop the proposed algorithm using a Web Service as a prototype to ensure the integrational accessibility coupled with the use of the cloud computing infrastructures to enhance the framework. Likewise, it will facilitate the ongoing maintenance, and expansion of the system from anywhere in the world. In essence, it will support the trends of Green IT to increase the system capabilities, and reliability. As well as, it will decrease the time-consuming and efforts to authenticate the information. REFERENCES [1] M. F. Noordin and R. Othman, An Information Retrieval System for Quranic Texts: A Proposed System Design, 2006 2nd International Conference on Information & Communication Technologies, pp. 1704-1709, 2006. [2] E. Alsulamy, Fundamentalists Used Quran and Sunni to Extract the Rules of Fundamentalism. Riyadh: Al Rushed library, 1999. [3] a. F. Shamsudin and a. Farooq, AI natural language in metasynthetics of Al-Qur an, 2000 TENCON Proceedings. Intelligent Systems and Technologies for the New Millennium (Cat. No.00CH37119), pp. 464-467. [4] H. S. Al-Khalifa, M. M. Al-Yahya, A. Bahanshal, and I. Al-Odah, SemQ: A proposed framework for representing semantic opposition in the Holy Quran using Semantic Web technologies, 2009 International Conference on the Current Trends in Information Technology (CTIT), pp. 1-4, Dec. 2009. [5] M. Shoaib, M. N. Yasin, U. K. Hikmat, M. I. Saeed, and M. S. H. Khiyal, Relational WordNet model for semantic search in Holy Quran, 2009 International Conference on Emerging Technologies, pp. 29-34, Oct. 2009. [6] M. A. Aabed, S. M. Awaideh, A. R. M. Elshafei, and A. A. Gutub, Arabic diacritics based steganography, in Signal Processing and Communications, 2007. ICSPC 2007. IEEE International Conference on, 2007, no. November, pp. 756 759. [7] M. Aljlayl and O. Frieder, On Arabic search: improving the retrieval effectiveness via a light stemming approach, in Proceedings of the eleventh international conference on Information and knowledge management, 2002, pp. 340 3. [8] K. Patterson, C. Watters, and M. Shepherd, Document Retrieval using Proximity-based Phrase Searching, in Hawaii International Conference on System Sciences, Proceedings of the 41st Annual, 2008, pp. 137 137. [9] M. Widenius and D. Axmark, MySQL reference manual: documentation from the source. O Reilly Media, Inc., 2002, p. 172. [10] O. Vechtomova and M. Karamuftuoglu, Approaches to high accuracy retrieval: Phrase-based search experiments in the HARD track, in Proceedings of TREC, 2004, no. 1. [11] G. Rasool and N. Asif, Software Artifacts Recovery using Abstract Regular Expressions, 2007 IEEE International Multitopic Conference, pp. 1-6, Dec. 2007. [12] Y. Zhenjun and J. Xiangyu, A simplified application of regular expressions: With the extraction of Chinese cultural terms as an example, in Computing, Communication, Control, and Management, 2009. CCCM 2009. ISECS International Colloquium on, 2009, vol. 1, pp. 439 442. [13] H. K. Al Ameed et al., Arabic Search Engines Improvement: A New Approach using Search Key Expansion Derived from Arabic Synonyms Structure, IEEE International Conference on Computer Systems and Applications, 2006., pp. 944-951, 2006. [14] Ketaballah.net, The Holy Quran search engine. [Online]. Available: http://www.ketaballah.net/searchquran.html. [Accessed: 08-Dec-2011]. [15] Muslim-web.com, The Holy Quran. [Online]. Available: http://quran.muslim-web.com/?lang=en. [Accessed: 08-Dec-2011]. [16] Holyquran.net, The Quran s prospector. [Online]. Available: http://www.holyquran.net/search/sindex.php. [Accessed: 08-Dec- 2011]. 343