INTERNATIONALIZED DOMAIN NAMES

Similar documents
Relevant Policy Documents: Saudi Domain Name Registration Regulation:

Figure 7.1. Sindhi Character Set

Proposal to encode South Arabian Script Requestors: Sultan Maktari, Kamal Mansour 30 July 2007

This file contains an excerpt from the character code tables and list of character names for The Unicode Standard, Version 3.0.

Rules for The Quran Spelling Bee(Q-Bee)

Madrasa Tajweedul Quran

Quran Spelling Bee Second Level (Third to fifth grade) competition words

Spelling. Fa kasrah, Ya. Meem fathah, Alif. Lam fathah, Alif

Sarf: 16 th March 2014

Arabic and Persian titles in the Leiden Library Catalogue Manual for using the Leiden collections in Arabic and Persian languages

@ó 061A

Basic Tajweed Rules for Proper Qur an Recitation

Cover Page. The handle holds various files of this Leiden University dissertation.

ABSTRACT The Title: The contribution of the Endowment in supporting the Scientific an Educational Foundations in Makkah Al-Mukarram during Othmani

SESSION 31 FREQUENT RECITATIONS. I. SPOKEN ARABIC: Use 3SP. For continuity, see Spoken Arabic in previous lesson.

Being Grateful. From the Resident Aalima at Hujjat KSIMC London, Dr Masuma Jaffer address:

Basics OF TAJWEED. Prepared by Mawlana Faisal Meman

Towards Transliteration between Sindhi Scripts Using Roman Script

TOWARDS UNICODE STANDARD FOR URDU - WG2 N2413-1/SC2 N35891

Adab 1: Prohibitions of the Tongue. Lecture 6

QUR ANIC ARABIC - LEVEL 1. Unit ٢٦ - Present Passive

Arabic. Arabic Page 1

Adab 1: Prohibitions of the Tongue. Lecture 3

Ayatul Kursi (2: )

Muharram 23, 1439 H Ikha 14, 1396 HS October 14, 2017 CE

ا ح د أ ز ح ا س اح ني ح ث ع ا ت س اح ث ا بس أ ج ع ني, أ ال إ إ ال ا و ح د ال ش س ه ا ه ا ح ك ا ج ني و أ ش ه د أ س د ب

Inheritance and Heirship

Cure for Black Magic A Quranic Story

Arabic. The previous UN-approved system is still found in considerable international usage.

ISLAMIC FOUNDATION OF TORONTO EVENING MADRASSAH AND SUNDAY SCHOOL BASIC TAJWEED RULES

The Virtues of Surah An-Nasr

ROMANIZATION SYSTEM FOR PASHTO

Hazrat Ameer s Ramadan Message

ITA AT: TO OBEY HIM WITHOUT QUESTION

Commentary of Mustafa Jaane Rahmat Salam Verse No. 82

ALI 340: Elements of Effective Communication Session Six


Our bodies & health is a trust & gift from Allah, therefore we must use it responsibly, not waste it, and maximise its benefit. Muslims/Asians are

Safar The 2 nd Month of Islamic Calendar

Surah Mumtahina. Tafseer Part 1

This is the last class of phase One and our next class will be phase Two in shaa Allaah.

Fiqh of Dream Interpretation. Class 2 (24/7/16)

from your Creator طه Ta, Ha. 20:1

Arabic Curriculum. Year1-Term1 WRITTEN BY ABOO IBRAAHEEM HAAROON BIN SAAJIDUR-RAHMAAN

1 The authors wish to acknowledge the support of the Universal Scripts Project (part of the

Race to Jannah - 6 Group E: Surah Taha

Welcome to ALI 440: Topical Tafsir of Quran Family Relationships

Enjoyislam team has made every effort to ensure the accuracy and reliability of the content.

K n o w A l l a h i n P r o s p e r i t y

Revealed in Mecca. Consist of 34 verses LESSONS FROM LUQMAN. Br. Wael Ibrahim. How can we implement the lessons in our daily lives?

Revision worksheet for grade 6. Lesson one (Surat As-Sajdah) c. Both have the same massage which is worshipping Allah

Adab 1: Prohibitions of the Tongue. Lecture 10

Arabic Inline Characters

15 JUNE SHA BAN 1435 CLASS #8

Ihsan with the Quran Surah An Nab a Class #9

ALI 258: Qualities of a Faithful believer Khutba No. 87 March 25, 2014/ Jumadi I 23, 1435

KHOJA SHIA ITHNA-ASHARI JAMAAT MELBOURNE INC. In the name of Allah (swt), the Most Compassionate, the Most Merciful

Scope & Sequence Grade KG: Arabic, Islamic Studies, & Quran

The seven ways of reading Suratu l-fatiha via the tariq of Imam al-shatibi,

Rabi`ul Awwal 13, 1439 H Fatah 2, 1396 HS December 2, 2017 CE

IMAM SAJJAD INSTITUTE

Contents. Transliteration Key إ أ) ء (a slight catch in the breath) غ gh (similar to French r)

Tawheed: Its Meaning & Categories

ش ر ور أ ن ف س ن ا و م ن ل ل ھ و م ن ی ض ل ل ف لا ھ اد ي ل ھ و أ ش ھ د أ ن ھ د أ ن م ح مد ا ع ب د ه و ر س ول ھ

Blessings of Fasting (Islamic Discourse)

Divine Messages of nurturing. from your Creator

Computable Difference Matrix for Synonyms in Holy Quran

ALI 256: Spiritual and Jurisprudential aspects Salaat

ALI 340: Elements of Effective Communication Session Four

ROMANIZATION SYSTEM FOR ARABIC

THE RIGHTS OF RASOOLULLAH ON HIS UMMAH ARE 7:

1. In Islam there is NO hatred of others. WE DO NOT DIFFERENTIATE on Race, Ethnicity, Colour, Nationality or Religion.

Friday Sermon Slides 9 th October, 2009

The First Ten or Last Ten Verses of Sūrah al-kahf

Scope & Sequence 1 st Grade: Arabic, Islamic Studies, & Quran. 1 st Quarter (45 Days) Arabic Islamic Studies Quran

Scope & Sequence 1 st Grade: Arabic, Islamic Studies, & Quran. Arabic Islamic Studies Quran

Submission is the name of an Attitude

Quranic & Prophetic Nurturing Program

2/20/17. Āyāt. Force into marriage. Something disgusting. Maqt

Rewayat Hafs 'An 'Aasim by the way of Shaatibiyyah. Week 9 Sifat Al-Horoof Istilaa/Tafkheem (elevated) vs Istifal/Tarqeeq (lowering) 21 Shawal 1434

23 FEBRUARY RABEE AL AKHAR 1435 CLASS #28

Surah At Taghabun ( التغابن (سورة Ayat 9 to 13

ISLAMIC CREED ( I ) Instructor: Dr. Mohamed Salah

Rabi`ul Awwal 3, 1438 H Fatah 3, 1395 HS December 3, 2016 CE

آفح انكغم و انرغى ف. Procrastination, Laziness & Sedentary

Commentary of Mustafa Jaane Rahmat Salam Verse No. 6 Noore Aine Lataafat pe Altaf Durood Zeb-o-Zain-e-Lataafat pe laakhon Salaam

In the Name of Allah, the Most Gracious, the Most Merciful.

Knowing Allah (SWT) Through Nahjul Balagha. Khutba 91: Examining the Attributes of Allah

Story #4 Surah Al-Qasas [Verses 76- ]

In the Name of Allah, the Most Gracious, the Most Merciful.

Ways the Misguided Youth Bent on Takfīr & Bombings

23 MARCH JAMAD AL AWWAL 1435 CLASS #32

Dua Mujeer 13, 14, 15. th th th.

ISTIGHFAAR Combined with The 99 Names of Allah

Leadership - Definitions

In the Name of Allah: The Most Compassionate, the Most Beneficient. The Sunnah: A Clarification of what was Revealed. The First Khutbah:

Suggested Global Islamic Calendar By Khalid Shaukat, prepared for

The next slide is at B-sound of. Bismillah

9. What is Satan? Is Satan Iblis?

Tafseer: Group C. Surah Al- Mulk (1-14) (The Kingdom / The Dominion)

Transcription:

Draft Policy Document for INTERNATIONALIZED DOMAIN NAMES Language: URDU 1

VERSIO N NUMBE R DATE 1.0 29 September 2014 RECORD OF CHANGES *A - ADDED M - MODIFIED D - DELETED POINTS AFFECTED All A* M D TITLE OR BRIEF DESCRIPTION Final Policy document COMPLIANC E VERSION OF MAIN POLICY DOCUMENT 1.0 2

Table of Contents 1. PERSO-ARABIC SCRIPTS: GENERAL INTRODUCTION...4 1.1. OVERVIEW...4 1.2. GENERAL STRATEGY FOR URDU...4 2. RESTRICTION RULES...8 3. LANGUAGE TABLE: URDU...9 4. NOMENCLATURAL DESCRIPTION TABLE OF URDU LANGUAGE TABLE10 5. VARIANT TABLE FOR URDU...13 6. EXPERTISE/BODIES CONSULTED...15 7. PROPOSED cctld FOR URDU...16 3

1. PERSO-ARABIC SCRIPTS: GENERAL INTRODUCTION 1.1. OVERVIEW Three languages in India use the Perso-Arabic script. These are Urdu, Sindhi and Kashmiri 1. Unlike Brahmi derived languages which are abugidas i.e. syllable driven, Perso- Arabic driven languages are abjads i.e. character based. The concept of the ISCII syllable has therefore no pertinence insofar as languages derived from the Perso- Arabic script are concerned. Therefore, unlike Hindi or Tamil for example, Urdu has no Augmented Backus Naur Formalism (ABNF). However Urdu does admit restriction rules as given in Section 5 below. The template for Perso-Arabic derived languages admits only the Code-chart with the pertinent characters marked in yellow, the corresponding nomenclatural table as well as the variant list. 1.2. GENERAL STRATEGY FOR URDU Of all the Indian languages, the Perso-Arabic script represents the greatest amount of difficulties and also chances of spoofing and phishing. This is because of the intrinsic nature of the script which has a large degree of homographs and also the fact that Unicode code block (U+0600 U+06FF) caters to a large number of languages and there is a large degree of resemblance between two or more characters. To simplify the problem and ensure that as far as possible spoofing and phishing will be reduced to a bare minimum, the following strategy is proposed: 1.2.1. MAPPING IN CONSONANCE WITH THE POLICY LAID DOWN BY GOVT. OF INDIA www will always remain in English. It is the Middle layer and the cctld which will remain in Urdu. It is assumed that the Bidi algorithm built into the browser used should handle the directionality of English and Urdu efficiently. The cctld used will be a suitable equivalent of.in in Urdu. The بھارت translation of India into Urdu shall be The character set prescribed for Urdu will be IDNA 2008 compliant. 1 Sindhi and Kashmiri are also written in the Devanagari script.

The number of permissible characters shall not exceed 63 when converted to Punycode (inclusive of ACE Prefix). Script vs. Language: Unicode Code Block (U+0600 U+06FF) caters to a large number of languages. Only the pertinent character set for Urdu shall be used. No mixing of two languages will be allowed with in the domain label inside the zones. The Latin full-stop shall be used instead of the corresponding URDU punctuation marker. All digits will be the International Digit Set i.e. 0,1,2,3,4,5,6,7,8,9 and not the Arabic-Indic digit set as prescribed in the Code-page for Arabic. Similarly English Hyphen will be used and not the corresponding Urdu Hyphen. ZWJ and ZWNJ shall not be permitted. Space (A major issue in Perso-Arabic scripts) shall not be permitted within the domain name. 1.2.2. DIRECTIVE PRINCIPLES SPECIFIC TO URDU: PRINCIPLE I: The permissible Character Set The Urdu code-set will be defined and isolated from the Arabic page i.e. only those characters which are permissible in Urdu will be retained. Since Unicode Code Block (U+0600 U+06FF) is highly liable to spoofing, the choice of the character-set pertinent to Urdu alone will reduce spoofing and phishing. PRINCIPLE II: Identification of Characters liable to Spoofing. Characters liable to cause spoofing shall be identified and treated as variants. These will also include normalization. PRINCIPLE III: Diacritics reduced to a bare minimum As far as possible, all diacritics will be eliminated from the set. Only the most important and pertinent diacritics shall be retained. These are: (i) ARABIC MADDAH ABOVE (0653 ) (ii) ARABIC HAMZA ABOVE (0654 ) (iii) ARABIC HAMZA BELOW (0655 ) (iv) ARABIC SHADDA (0651 ) (v) ARABIC SUBSCRIPT ALEF (0656 ) (vi) ARABIC LETTER SUPERSCRIPT ALEF (0670 ) Alif, Madd and Hamza Characters most frequently used in Urdu are as under and these will be admitted to the permissible set. ۓ ؤ ئ ۂ أٳ آ

Their corresponding combinations shall be treated as variants. Thus ) followed by (0653 ) in some ا (0627 as can also be entered (آ (0622 Urdu keyboards and it is to resolve this alternative mode of entry that such as normalization is permitted in the shape of a variant. PRINCIPLE IV: EZAFAT A serious issue will be that of the ezafat in words such as Yaad-e-Khuda or Aab-o-Hawa. As a palliative suggestion, it is suggested that the ezafat be represented by: ے (i) ARABIC LETTER YEH BARREE U+06D2 و U+0648 (ii) ARABIC LETTER WAW ء U+0621 (iii) ARABIC LETTER HAMZA Separated by a hyphen as in the examples below: یاد-ے- خدا آب-و- ہوا ے و PRINCIPLE V: Visual Identity of the Word: The case of Space between two words within a URL. Since a large number of characters in Perso-Arabic can join together unless separated by a Space, Space is a cardinal issue in all Perso-Arabic driven languages. Space ensures visual identity. Since Space is not permissible within a URL, visual identity where two words constitute a URL constitutes a major issue. A palliative to this issue would be the use of the hyphen to separate two words and thereby ensure legibility. Thus in the case of a site for a mango pickle: aam aachaar which when written together would be illegible. آمآچار The solution would be to separate out the two words with a hyphen as shown below. آچار آم-

PRINCIPLE VI: Use of Naskh instead of Nastalique in the URL Naskh is more visually clear and reduces also spoofing and pharming because of clear legibility of the joining characters as is shown below: Naskh Nastalique

2. RESTRICTION RULES Urdu admits following restriction rules: 1. ARABIC MADDAH ABOVE U+0653 shall be allowed only after the following character. ا U+0627 (a) ARABIC LETTER ALEF 2. ARABIC HAMZA ABOVE U+0654 shall be allowed only after the following characters. ا U+0627 (a) ARABIC LETTER ALEF و U+0648 (b) ARABIC LETTER WAW ہ (c) ARABIC LETTER HEH GOAL U+06C1 ے (d) ARABIC LETTER YEH BARREE U+06D2 ی (e) ARABIC LETTER FARSI YEH U+06CC 3. ARABIC HAMZA BELOW U+0655 shall be allowed only after the following character. ا U+0627 (a) ARABIC LETTER ALEF 4. Apart from permissible single diacritics, only the below combinations of two diacritics are allowed- (a) ARABIC SHADDA U+0651 followed by ARABIC SUBSCRIPT ALEF U+0656. (b) ARABIC SHADDA U+0651 followed by ARABIC LETTER SUPERSCRIPT ALEF U+0670. 5. Consecutive Hyphens will not be permitted in a domain name. 6. A label containing more than three instances of variant character(s) will not be permitted. As an example let us consider a, b, c and d as four variants in a given label having a', b', c' and d' as variants in which case such a label will be disallowed. (E.g. of disallowed label - abcd, acdb, cdaba and so on) Additional Note: Wherever a variant is present in a given label, the variants shall be strictly symmetric and non-transitive. Thus given some variants ۂ (U+06C2) ہ (U+06C1+U+0654) and.طرہ shall be طرۂ (U+06C3). One of the variants of a label such as ۃ ( U+06C1 )ہ permitted. ( U+06C1 )ہ shall not be ( U+06C3 )ۃ to generated by adding an extra طرۃ This ensures that over generativity does not take place.

3. LANGUAGE TABLE 2 : URDU 3 2 3 This language table is based on Unicode Chart for Arabic script provided by the Unicode Consortium Characters marked in yellow are not applicable to the language.

4. NOMENCLATURAL DESCRIPTION TABLE OF URDU LANGUAGE TABLE The following are basic alphabetic characters for Urdu, and will therefore be allowed. PERMISSIBLE URDU CHARACTER SET ARABIC LETTER HAMZA ء 0621 ARABIC LETTER ALEF ا 0627 ARABIC LETTER BEH ب 0628 062A ت ARABIC LETTER TEH 062B ث ARABIC LETTER THEH 062C ج ARABIC LETTER JEEM 062D ح ARABIC LETTER HAH 062E خ ARABIC LETTER KHAH 062F د ARABIC LETTER DAL ARABIC LETTER THAL ذ 0630 ARABIC LETTER REH ر 0631 ARABIC LETTER ZAIN ز 0632 ARABIC LETTER SEEN س 0633 ARABIC LETTER SHEEN ش 0634 ARABIC LETTER SAD ص 0635 ARABIC LETTER DAD ض 0636 ARABIC LETTER TAH ط 0637 ARABIC LETTER ZAH ظ 0638 ARABIC LETTER AIN ع 0639 063A غ ARABIC LETTER GHAIN ARABIC LETTER FEH ف 0641

0642 ق ARABIC LETTER QAF 0644 ل ARABIC LETTER LAM 0645 م ARABIC LETTER MEEM 0646 ن ARABIC LETTER NOON 0647 ہ ARABIC LETTER HEH 0648 و ARABIC LETTER WAW 0679 ٹ ARABIC LETTER TTEH 067E پ ARABIC LETTER PEH 0686 چ ARABIC LETTER TCHEH 0688 ڈ ARABIC LETTER DDAL 0691 ڑ ARABIC LETTER RREH 0698 ژ ARABIC LETTER JEH 06A9 ک ARABIC LETTER KEHEH 06AF گ ARABIC LETTER GAF 06BA ں ARABIC LETTER NOON GHUNNA 06BE ھ ARABIC LETTER HEH DOACHASHMEE 06C1 ہ ARABIC LETTER HEH GOAL 06C3 ۃ ARABIC LETTER TEH MARBUTA GOAL 06CC ی ARABIC LETTER FARSI YEH 06D2 ے ARABIC LETTER YEH BARREE The following combinations of base character and diacritic will also be allowed: آ 0622 أ 0623 ؤ 0624 ARABIC LETTER ALEF WITH MADDA ABOVE ARABIC LETTER ALEF WITH HAMZA ABOVE ARABIC LETTER WAW WITH HAMZA ABOVE

0625 إ 0626 ئ ۂ 06C2 ۓ 06D3 ARABIC LETTER ALEF WITH HAMZA BELOW ARABIC LETTER YEH WITH HAMZA ABOVE ARABIC LETTER HEH GOAL WITH HAMZA ABOVE ARABIC LETTER YEH BARREE WITH HAMZA ABOVE Apart from above set of characters, the following diacritics are also allowed: 0651 ARABIC SHADDA 0653 ARABIC MADDAH ABOVE 0654 ARABIC HAMZA ABOVE 0655 ARABIC HAMZA BELOW 0656 ARABIC SUBSCRIPT ALEF 0670 ARABIC LETTER SUPERSCRIPT ALEF

5. VARIANT TABLE FOR URDU The following variants are based on a single character combination which can be also entered as a combination of two characters. It should be noted that these variants have been admitted to accommodate keyboards where a single character representing a combination such as alif madd آ is not available and the user has to enter alif and madd separately. ں 06BA ہ 06C1 آ 0622 أ 0623 ؤ 0624 إ 0625 ئ 0626 ۂ 06C2 ۓ 06D3 VARIANTS ن 0646 ۃ 06C3 ا 0627 + 0653 ا 0627 + 0654 و 0648 + 0654 ا 0627 + 0655 ی 06CC + 0654 ہ 06C1 + 0654 ے 06D2 + 0654 Caveats Other characters distinguished by a single Nukta such as suad ~ zuad have not been included, since this would have made the attribution of URL s too restrictive.

All other cases are handled by the exclusive character set for Urdu and absence of diacritics.

6. EXPERTISE/BODIES CONSULTED Expertise provided by experts of Urdu language and Urdu computational Linguistics of Osmania University and Maulana Azad National Urdu University.

7. PROPOSED cctld FOR URDU بھارت - Urdu India (Bhārat) localized in Note: You can send your feedbacks to idn-feedback@cdac.in