Towards Transliteration between Sindhi Scripts Using Roman Script

Similar documents
Figure 7.1. Sindhi Character Set

Sarf: 16 th March 2014

:46:41 pm 1

Term I. Subject : English (Written)

Arabic. Arabic Page 1

Madrasa Tajweedul Quran

DAV PUBLIC SCHOOL,ASHOK VIHAR,PH-IV,DELHI SESSION

GURU HARKRISHAN PUBLIC SCHOOL VASANT VIHAR NEW DELHI HOLIDAYS HOME WORK CLASS-III ENGLISH

Arabic. The previous UN-approved system is still found in considerable international usage.

ह द : 1. सभ म त र ओ स सम ब हदत २-२ शब द ल ख ए 2.प च प ज स ल न

DAV CENTENARY PUBLIC SCHOOL, PASCHIM ENCLAVE, NEW DELHI-87 SUMMATIVE ASSESSMENT 2 (SESSION ) CLASS III

Contents. Transliteration Key إ أ) ء (a slight catch in the breath) غ gh (similar to French r)

Sūrat al-quraysh (tribe of Quraysh)

Month of Safar Daily Duʿā

Sūrat al-sharḥ (Consolation, Relief, The Expansion, The Expanding, The Opening-Up of the Heart)

DELHI PUBLIC SCHOOL NTPC FARAKKA SYLLABUS BREAKUP FOR

Adab 1: Prohibitions of the Tongue. Lecture 6

Sūrat al-zalzalah (The Quake)

KENDRIYA VIDYALAYA KHICHRIPUR, DELHI (SHIFT II) HOLIDAY HOMEWORK FOR WINTER BREAK SESSION

INTERNATIONALIZED DOMAIN NAMES

TIME AND WORK QUESTIONS FOR SSC GD RPF EXAM 2018 TIME AND WORK PDF HINDI 2018

:47:09 pm

DELHI PUBLIC SCHOOL NTPC FARAKKA SYLLABUS BREAKUP FOR

vlk/kj.k EXTRAORDINARY Hkkx II [k.m 3 mi&[k.m (ii) PART II Section 3 Sub-section (ii) izkf/dkj ls izdkf'kr PUBLISHED BY AUTHORITY

Arabic and Persian titles in the Leiden Library Catalogue Manual for using the Leiden collections in Arabic and Persian languages

QUR ANIC ARABIC - LEVEL 1. Unit ٢٦ - Present Passive

Summer Holiday home work

ABSTRACT The Title: The contribution of the Endowment in supporting the Scientific an Educational Foundations in Makkah Al-Mukarram during Othmani

Welcome to ALI 440: Topical Tafsir of Quran Family Relationships

SESSION 31 FREQUENT RECITATIONS. I. SPOKEN ARABIC: Use 3SP. For continuity, see Spoken Arabic in previous lesson.

23 FEBRUARY RABEE AL AKHAR 1435 CLASS #28

MAULANA WAHIDUDDIN KHAN S VIEW ON JIHÂD FÎ SABÎLILLÂH

KV Paschim Vihar Winter holiday homework Class I

Hindi. Lesson 8 Skip Counting Lesson 11 Money Lesson -12 Time Addition carry over

ختريج أحاديث م ا ة انرمحه يف تفسري انقرآن نهشيخ ادلدرس

The Virtues of Surah An-Nasr

THE MESSAGES BEYOND MUNĀSABAT AL-ĀYĀT IN SURAH AL-GHĀSHIYAH (A Comparative Study between Ibrāhīm bin Umar al- Biqā ī And Muĥammad Ţāhir Ibn Āshūr)

Basic Tajweed Rules for Proper Qur an Recitation

ITA AT: TO OBEY HIM WITHOUT QUESTION

Rules for The Quran Spelling Bee(Q-Bee)

الفعل الماضي. The Past-Tense Verb

23 MARCH JAMAD AL AWWAL 1435 CLASS #32

Ayatul Kursi (2: )

Lt. Col. Mehar Little Angels Sr. Sec. School. Lesson 1 (No Smiles Today) Q.1. How do you know that Shanti and Arun were good friends?

Muharram 23, 1439 H Ikha 14, 1396 HS October 14, 2017 CE

Bill No. 15 of 2014 THE CONTRACT LABOUR (REGULATION AND ABOLITION) (RAJASTHAN AMENDMENT) BILL, 2014 (To be Introduced in the Rajasthan Legislative

MESSAGE BY I/C HM. A Child Without Education Is Like A Bird Without Wings. Mr. ANIL KUMAR (PRINCIPAL)


Ihsan with the Quran Surah An Nab a Class #9

ARMY PUBLIC SCHOOL MEERUT CANTT SYLLABUS FOR UNIT TEST II CLASS VIII,

Arabic Curriculum. Year1-Term1 WRITTEN BY ABOO IBRAAHEEM HAAROON BIN SAAJIDUR-RAHMAAN

Inheritance and Heirship

Fiqh of Dream Interpretation. Class 2 (24/7/16)


The Prayer of Repentance Salāh al-tawbah Its Description and Rulings

Broadways International School,Sec-76, Gurugram

THE PRINCIPLES OF ISLAMIC PREACHING ACCORDING TO AL-QUR AN

Adab 1: Prohibitions of the Tongue. Lecture 3

الفعل الماضي. The Past-Tense Verb

Scope & Sequence Grade KG: Arabic, Islamic Studies, & Quran

6 BACHELOR OF COMMERCE (B.COM.)(CBSGS)(75:25)SEM VI / C0185 FINANCIAL ACCOUNTING & AUDITING : PAPER X AUDITI. [Time: Hours ] [Marks: 75 ]

Computable Difference Matrix for Synonyms in Holy Quran

THE CONCEPT OF DHIKR ACCORDING TO AL- GHAZALI AND ITS PSYCHOLOGYCAL BENEFIT

Bill No. 13 of 2011 THE RAJASTHAN AGRICULTURAL PRODUCE MARKETS (AMENDMENT) BILL, 2011 (To be Introduced in the Rajasthan Legislative Assembly) A Bill

Being Grateful. From the Resident Aalima at Hujjat KSIMC London, Dr Masuma Jaffer address:

Leadership - Definitions

Islam and The Environment

Revision worksheet for grade 6. Lesson one (Surat As-Sajdah) c. Both have the same massage which is worshipping Allah

SCHOOL OF ENGINEERING AND TECHNOLOGY MONAD UNIVERSITY, HAPUR

Q.2 A) Write a detail note on effective solid waste management. 10 B) Discuss various effects of hazardous waste on environment and health.

Sūrat al-dhuha (The Morning Brightness) الضحى

Adab 1: Prohibitions of the Tongue. Lecture 10

Surah Mumtahina. Tafseer Part 1

Vikas Bharati Public School Holiday Homework( ) Class-VI

Friday Sermon Slides September 25 th, 2009

ALI 256: Spiritual and Jurisprudential aspects Salaat

Surah At Taghabun ( التغابن (سورة Ayat 9 to 13

A Glimpse of Tafsir-e Nur: Verses of Surah al-an am

ALI 258: Qualities of a Faithful believer Khutba No. 87 March 25, 2014/ Jumadi I 23, 1435

Relevant Policy Documents: Saudi Domain Name Registration Regulation:

ÛIm] g]v]t]/ g]it]] म क षस न य सय ग:

Rabi`ul Awwal 13, 1439 H Fatah 2, 1396 HS December 2, 2017 CE

Questions & Answers Answers

Friday Sermon Slides 9 th October, 2009

ROMANIZATION SYSTEM FOR PASHTO

Ihsan with the Quran Surah An Nab a Class #10

Quran Spelling Bee Second Level (Third to fifth grade) competition words

ا ح د أ ز ح ا س اح ني ح ث ع ا ت س اح ث ا بس أ ج ع ني, أ ال إ إ ال ا و ح د ال ش س ه ا ه ا ح ك ا ج ني و أ ش ه د أ س د ب

In the Name of Allah, the Most Gracious, the Most Merciful.

ٹ ڤ ڤ ڤ ڤ ڦ ڦ. And most of them believe not in Allâh except that they attribute partners unto Him. [Yuusuf 12:106]

Application Reference Letter

Tafseer: SurahYusuf. Part 4

Story #4 Surah Al-Qasas [Verses 76- ]

Revealed in Mecca. Consist of 34 verses LESSONS FROM LUQMAN. Br. Wael Ibrahim. How can we implement the lessons in our daily lives?

Race to Jannah - 6 Group E: Surah Taha

ISLAMIC CREED ( I ) Instructor: Dr. Mohamed Salah

Enjoyislam team has made every effort to ensure the accuracy and reliability of the content.

اإلنفاق يف ج ه انرب اإلحسان أبعاده االلتصاد ت Spending in the object of Charity and economic dimensions اندكت ر. س ري اسني حسني اجلامعت انعرال ت / كه

J.P. World School, Jammu Syllabus Bifurcation: Class: U.K.G

Chapter 26: The Sin of Favoritism Be Just With Your Children

Transcription:

Copyright Linguistics & Literature Review (LLR) ISSN: 2409-109X Online ISSN: 2221-6510 Print Vol. I, Issue 2, October 2015 Towards Transliteration between Sindhi Scripts Using Roman Script Mehwish Leghari Mutee U Rahman Department of Computer Science, Isra University - Hyderabad, Pakistan ABSTRACT In this research a model for transliteration is presented for two scripts of Sindhi language that is Perso-Arabic script and Devanagari script, based on an intermediate Roman script. After analyzing both Perso-Arabic and Devanagari scripts, a set of Roman script for Sindhi language is also suggested. Different issues, complexities and problems of Sindhi transliteration are discussed in detail. An algorithm to transliterate between two scripts of Sindhi language is also proposed. Keywords: Perso-Arabic script, transliteration, complexities, Devanagari script, Roman script Introduction Transliteration is transformation of text from one script to another, usually based on phonetic equivalencies (IBM, 1999). Popularity and simplicity of roman script is a major reason behind the motivation of transliteration of language scripts. People and software are getting benefited from these transliteration aids. Transliteration from native scripts to Roman script has been achieved for many Asian languages including Arabic, Bengali, Persian, Hindi, Punjabi and Urdu. Transliteration between Punjabi scripts (Malik, 2006 b) and Hindi to Urdu transliteration (Malik, et al., 2008) are key examples of South Asian language transliterations. Different transliteration applications like Google Transliteration IME facilitate the users of different languages to transliterate from their native scripts to Roman script. It is currently available for 19 different languages including Arabic, Bengali, Farsi (Persian), Gujarati, Hindi, Punjabi, Sanskrit and Urdu. Transliteration of two scripts of Sindhi has not been achieved yet and in fact not even initiated. Sindhi computational linguistics has not received much encouragement in either Pakistan or India (Khubchandani, 1970) and works are limited to font design and word processing. But Sindhi computing should not only encircle font design and word processing but extensive research is needed in the areas of artificial intelligence, computational linguistics, natural language processing, corpus linguistics and script processing (including transliteration) (Rahman, 2009). CONTACT Mehwish Leghari at legharimehwish@hotmail.com 101

Following sections discuss a brief history of Sindhi language, Perso-Arabic and Devanagari scripts of Sindhi language, composition of both scripts, a set of suggested Roman script for Sindhi language and an algorithm for transliteration between two scripts of Sindhi language. Sindhi language Sindhi is an Indo-Aryan language with its roots in ancient history. Sindhi is being spoken by approximately 40 million (Sindhi Language Authority, 2009) people in Sindh province of Pakistan as well as in several states of India. In Pakistan Sindhi is written in Perso-Arabic script while in India Sindhi is written in both Devanagari and Perso-Arabic scripts. Both regions heavily share the same vocabulary. Sindhi scripts Sindhi scripts and their writing systems are briefly described below. Perso-Arabic script The Perso-Arabic script of Sindhi language consists of 52 letters; most of those are taken mainly from Arabic alphabet, some letters from Persian and few modified letters. In Perso-Arabic script each letter has one to four forms according to its position beginning, middle, final and standalone. Letters in Perso-Arabic script are divided in different types on the basis of phonemes. These different types are discussed below with reference to their phonemes, writing style and position in a word. First type is aspirated consonants. In Perso-Arabic script some of the aspirated sounds are written by combining two letters. For example aspirated form of canگ be written by combining (gh) similarly there are some other aspirated consonants. In Roman script گھ (h) as ھ (g) with گ h is combined with the letter of nearby sound. For example g + h = gh is used to represent گھ (gh). On the other hand there are some aspirated consonants in Sindhi that are represented using a single letter like ڇ (chh) and "ٿ (th). The non-aspirated consonants of Perso-Arabic script are transliterated according to their phonemes. At some places we find multiple non aspirated consonants for single phoneme and these all have single counterparts in Devanagari script. These are further discussed in section 3. There are three main vowels in Perso-Arabic script those are و ا and ي these vowels when come at the beginning of a word are simply treated as non-aspirated consonants; and at the end of the word these are treated as vowels. But in the middle of a word they need to be tackled in context of nearby letters whether those are vowels or consonants. The diacritical marks are essential for correct accent and if missing not only creates ambiguity in transliteration but also cause misinterpretation of the words. Thus diacritical marks are equally important for avoiding ambiguities in transliteration as those are important for natural language processing and speech synthesis (Malik, et al., 2009). Some of the examples of 102

diacritical marks include Zabar, Zer and Pesh. Examples of pairs of words that differ in meaning only because of difference in diacritical marks are shown in table 1. Table 1. Effects of diacritical marks on meaning. Word Meaning Word Meaning ا ٺ Eight ا ٺ Camel چ پ Lip چ پ Silent ک ل Laughter ک ل Skin م ل ڻ To rub م ل ڻ To meet ڪ ر ڻ To do ڪ ر ڻ To fall In Sindhi, there are four implosive stops. Using Perso-Arabic script those are represented by گ adding extra dot(s) to the letter of nearby matching sound. For example implosive version of (g) is ڳ (gg) similarly there are three more implosive stops or implosive sounds. In Roman script these letters are represented by doubling the letters of nearby sounds like: bb, jj, dd, gg as this convention is commonly used in Sindhi-Roman script. Devanagari script Devanagari script adopted from Sanskrit system of writing in which each character represents a syllable. Devanagari script is written from left to right. Many of the letters having same phonemes or sounds in Perso-Arabic script are equivalent of a single letter in Devanagari script. It is also worth mention here that two letters of Perso-Arabic script have no equivalent in Devanagari script those are ء ( a) and ع (A). Devanagari script also have aspirated and non-aspirated consonants but unlike Perso- Arabic script the Devanagari script do not use composite letters for aspirated consonants. Hence all aspirated consonants are denoted by a single letter. In Devanagari script two types of vowels are independent and dependent vowels. Independent form of a vowel is used at the beginning of a word and dependent form is used at the end of a word. While in the middle, usually dependent form is used but there are some exceptions. Unlike Perso-Arabic script the diacritical marks are not optional in Devanagari script. As shown in Example 1. Example 1 त स ह ण आ. toon suhinree AaheeN. You beautiful are ت ون س ھ ڻ ي آھ ين. You are beautiful. As shown in table 2 diacritical marks are most widely used and are integral part of words written in Devanagari script. Thus transliteration accuracy is assured while going from Devanagari to Perso-Arabic script. 103

Table 2. Diacritical marks in Devanagari ت ون त+ + = त toon س ھ ڻ ي स+ + +ह + ण+ = स ह ण suhinree آھ ين आ+ + + = आ AaheeN Table 3. Simple consonants and independent vowels in Devanagari, Roman and Perso-Arabic scripts. आ Aa آ अ a ا ब b ب भ bh ڀ थ th ٿ ट T ٽ ठ Thh ٺ प p پ ज j ج झ jh جھ ञ nn ڃ च ch چ छ chh ڇ ख़ khh خ द d د ध dh ڌ ड D ڊ ढ Dh ڍ र r ر ड़ R ڙ श sh ش ग़ G غ फ़ f ف फ ph ڦ क़ q ق क k ڪ ख kh ک 104

ग g گ घ gh گھ ङ Ng ڱ ल l ل म m م न n ن N ن ण Nr ڻ व v و य y ي Perso-Arabic, Devanagari and Roman scripts Roman script is based on the alphabet developed by the ancient Romans, and used by most of the languages of Europe, including English, French, and German (SIL International, 2003). To achieve Sindhi transliteration an intermediate Roman script is used. As writing Sindhi and other languages in Roman script (English) is very common nowadays, so in this model, all possible steps are taken to preserve most common Roman style of writing. Therefore one do not feel any difficulty in adopting this method and can transliterate in any direction from Sindhi to Sindhi (Perso-Arabic, Roman and Devanagari or vice versa). Most of the consonants are transliterated to their matching sounds in Roman script. Perso-Arabic, Devanagari and equivalent Roman script mapping is shown in table 3. Table 3 contains all consonants except those having same phonemes with others and implosive stops. Four unique implosive stops in Sindhi are shown in table 4. Table 4. Implosive stops ब bb ٻ ज jj ڄ ड dd ڏ ग gg ڳ Besides the letters listed in table 3 and table 4 there are multiple letters in Sindhi, using Perso- Arabic script with same equivalent in Devanagari script, as shown in table 5. This is because of similar sounds, though we have suggested separate equivalents in Roman script for these letters. The letter ع came in Sindhi alphabet from Arabic. Native Sindhi speakers do not pronounce it properly (generating sound from inner throat) in normal conversations and there is no equivalent of ع (A) in Devanagari script (Malik, 2006a). Same is true for ء ( a) of the Perso- Arabic script. These two letters can be transliterated easily from Perso-Arabic to Roman script and vice versa. In case of Devanagari transliteration these are either ignored or transliterated into 105

अ (a) or (e). Mostly these letters are ignored during transliteration of Perso-Arabic or Roman script to Devanagari script as shown in example 2 and table 6. Example 2. معاف ڪجو! maaaf kajo! forgive do म फ़ कज! Do forgive! / Forgive me! Table 5. Perso-Arabic multiple consonants with same Devanagari equivalents त t ت त Tt ط H ح h ه ज़ Z ذ ज़ z ز ज़ zz ض ज़ Zz ظ स s س स S ص स c ث Table 6. Letters with no equivalents म m م - A ع aa ا फ़ f ف We can clearly illustrate from example 2, by further analyzing the word معاف (maaaf) in table 6 that the use of ع (A) has been completely omitted in transliteration from Roman / Perso- Arabic to Devanagari. Those two letters are separately shown in table 7, with their equivalent Roman letters. Table 7. Perso-Arabic letters with no Devanagari equivalent - A ع - a ء 106

There are two special words that are written in some special form by using single letter and two elongated quotation marks beneath the letter. These are shown in table 8 in all three scripts. Note that roman representation of these letters is capitalized to avoid ambiguity with other letters in the word or sentence. Table 8. Special single letter words of Sindhi ऎ AEN म MEN The dependent vowels and diacritical marks are shown in table 9 in all three scripts. These are shown in combination with letter ज (j) in Devanagari, j in Roman script and ج (j) in Perso-Arabic script to make it more understandable. Table 9. Dependent vowels and diacritical marks ज ja ج ज + ह = हज ji ج ज + =ज ju ج ज+ =ज jo جو ज + =ज joo ج و ज + =ज je جي ज + =ज jee جي ज+ =ज jaa جا Sample conversions and problems The dictionary lookup is used for transliteration of the words, in which one or more letters follow none of the rules. The conversions from one script of Sindhi to any other script can be achieved by implementing the rules given below: i. Take a whole word as input. ii. Dictionary lookup for especial words. iii. If not in dictionary, start transliterating. iv. Transliterate the first letter as a consonant or independent vowel. v. From second to second last letter if any letter is consonant or vowel with a diacritical mark, transliterate it as consonant. vi. If the letter is a vowel and it has no diacritical mark, transliterate it as dependent vowel. vii. If last letter is vowel, transliterate it as dependent vowel. 107

Input Source Sindhi script Dictionary Lookup Yes If source is Roman No Output Roman Script Transliteration Algorithm Source to Roman Transliteration Roman to Destination The model and algorithm suggested in this research is designed on the basis of above s. Figure 1. Transliteration model for Sindhi scripts. These rules get more complex while transliterating from Perso-Arabic to Devanagari when there are words without proper diacritical marks. In this situation transliteration is done by analyzing the letters that come before and after the letter that have no diacritical marks. If there is no match with any condition (for example in case of consecutive vowels) then finally transliteration can be achieved on probability basis. A sample transliteration is shown in example 3. Example 3. ब ह र ड ढ गरम आ. bbaahir ddaadhee garmee Aahe. outside very hot is ٻاھر ڏاڍ ي گرم ي آھي. It is very hot outside. Output Destination Sindhi script The suggested set of rules transliterates majority of sentences correctly like in example 3. However there are some ambiguities for the letters that are not properly present in Devanagari script. For Example the letter ء ( a) of Perso-Arabic script is sometimes equivalent of अ (a) in ء (a) of Perso-Arabic script. Similarly same ا Devanagari, while अ (a) is actually equivalent of ( a) is sometimes equivalent to (e) (equivalent for Perso-Arabic ي (e)) and sometimes the ء ( a) is completely omitted to achieve the correct transliteration. As we can see in example 4 the letter 108

अ (a) of first word is wrongly transliterated into letter ا (a) of the Perso-Arabic Script while its.ء ( a) correct transliteration would be Example 4. अ ड ढ हपय र आ. hoo a ddaadhee piyaaree aahe. she very lovely is (incorrect) ھ وا ڏاڍ ي پ يار ي آھي. (correct) ھ وء ڏاڍ ي پ يار ي آھي. She is very lovely. Conclusion and future work After successful implementation of transliteration model discussed, one would be able to transliterate Sindhi from one script to another. People familiar with one script, will be able to understand the writings in other script. It will also be useful in implementing simple Roman to Sindhi (any of two scripts) transliteration. By implementation of successful transliteration design, the transliteration aids like Google Transliteration IME would be able to use Roman to Sindhi mapping, to make it possible to transliterate between Roman and native Sindhi scripts. Automatic transliteration will help to end up the discussions and dispute of Roman script adoption for Sindhi language. The proposed model needs to be checked on large scale by applying the algorithm on a reasonably large corpus. References IBM. 1999. Glossary of Unicode terms, 2010. Retrieved from http://www.ibm.com/developerworks/library/glossaries/unicode.html Khubchandani, L. M. 1970. Sindhi In Current Trends in Linguistics S. T. Albert 1(1): 219. The Hauge, Natherland: Muton & Co Malik, M. G. A. 2006. Hindi Urdu Machine Transliteration System, (Unpublished MS Thesis, Department of Linguitics, University of Paris 7, Denis Diderot, 2 Place Jussieu, Paris France). Malik, M. G. A. 2006. Punjabi Machine Transliteration, In the proceedings of the 21 st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, Manchester, UK. Retrieved from http://acl.ldc.upenn.edu/p/p06/p06-1143.pdf Malik, M. G. A, Boitet, C., and Bhattcharyya, P. 2009. A Hybrid Model for Urdu Hindi Transliteration In Proceedings of the 2009 Named Entities Workshop, ACL-IJCNLP 2009, 177 185. Retrieved from http://www.aclweb.org/anthology/w/w09/w09-3536.pdf Malik, M. G. A, Boitet, C, and Bhattcharyya, P. 2008. Hindi Urdu Machine Transliteration 109

using Finite-state Transducers, proceedings of COLING, Manchester, UK. Retrieved from http://www.aclweb.org/anthology-new/c/c08/c08-1068.pdf Rahman. M. 2009. Computational Linguistics and Sindhi Language Sindhi Boli Sindhi Language Authority, Hyderabad. SIL International. 2003. NRSI: Computers & Writing Systems. 2010. Retrieved from http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&cat_id=glossary Sindhi Language Authority. 2009. Sindhi Language 2010. Retrieved from http://www.sindhila.org/sindhi Language.htm 110