N3976 L2/11-130)

Similar documents
N3976R L2/11-130R

Additional digits Since the 1960s Shan digits have been used alongside Myanmar and European digits.

This is a preliminary proposal to encode the Mandaic script in the BMP of the UCS.

Proposal to encode Al-Dani Quranic marks used in Quran published in Libya. For consideration by UTC and ISO/IEC JTC1/SC2/WG2

Proposal to Encode the Typikon Symbols in Unicode: Part 2 Old Rite Symbols

ISO/IEC JTC1/SC2/WG2 N3816

Proposal to Encode the Typikon Symbols in Unicode

This document requests an additional character to be added to the UCS and contains the proposal summary form.

VOWEL SIGN CONSONANT SIGN SHAN MEDIAL WA contrasts with the

tone marks. (Figures 4, 5, 6, 7, and 8.)

Proposal to encode svara markers for the Jaiminiya Archika. 1. Background

Proposal to encode Quranic marks used in Quran published in Libya (Narration of Qaloon with script Aldani)

Request to encode South Indian CANDRABINDU-s. Shriramana Sharma, jamadagni-at-gmail-dot-com, India 2010-Oct Background

This document requests an additional character to be added to the UCS and contains the proposal summary form.

Proposal to Encode the Typikon Symbols in Unicode

If these characters were in second position in a cluster, would they interfere with searching operations? Example: vs.

This is a preliminary proposal to encode the Chakma script in the BMP of the UCS.

@ó 061A

ISO/IEC JTC1/SC2/WG2 N3767 L2/10-012R

Proposal to Encode the Grantha Script in the Basic Multilingual Plane (BMP) of ISO/IEC 10646

A. Administrative. B. Technical -- General

ISO/IEC JTC1/SC2/WG2 N4283 L2/12-214

Summary. Background. Individual Contribution For consideration by the UTC. Date:

1 RAÑJANA encompasses: Rañjana (Figure 1, 2, 3) Wartu (Figure 4)

+ HETH ḥw = WAW. ḥr = RESH + HETH. br = RESH + BETH + HETH ḥd = DALETH

Ê P p P f Í Ṣ ṣ Ṣ ž? ˆ Š š Š č, ǰ. œ BI bi BI be. œ LIḄA lebba heart RḄH rabba great

ISO/IEC JTC1/SC2/WG2 N25xx

Proposal to encode Grantha Chillu Marker sign in Unicode/ISO 10646

4. Radicals. The chief issue about which we would like feedback at this time is the question of the encoding of Jurchen radicals.

Proposal to Encode the Grantha Script in the Supplementary Multilingual Plane (SMP) of ISO/IEC 10646

JTC2/SC2/WG2 N 2190 Date:

TOWARDS UNICODE STANDARD FOR URDU - WG2 N2413-1/SC2 N35891

Response to the Proposal to Encode Phoenician in Unicode. Dean A. Snyder 8 June 2004

Proposal to Encode Alternative Characters for Biblical Hebrew

L2/ Background. Proposal

Revised proposal to encode Hanifi Rohingya in Unicode

ISO/IEC JTC1/SC2/WG2 N2972

typically extends beneath the killed letter and the letter following. A syllable is structured (and represented in the backing store) as follows:

MOVING TO A UNICODE-BASED LIBRARY SYSTEM: THE YESHIVA UNIVERSITY LIBRARY EXPERIENCE

ISO/IEC JTC1/SC2/WG2 Coded Character Set Secretariat: Japan (JISC)

ISO/IEC JTC/1 SC/2 WG/2 N2474. Xerox Research Center Europe. 25 April 2002, marked revisions 17 May 2002

Schema for the Transliteration of Sanskrit and Pāḷi

Proposal to add two Tifinagh characters for vowels in Tuareg language variants

Request for editorial updates to Indic scripts

Issues in the Representation of Pointed Hebrew in Unicode

Responses to Several Hebrew Related Items

Î 2CEB Ï 2CEC Ì 2CED Ó FE26 COMBINING CONJOINING MACRON

Proposal to Encode the Mark's Chapter Glyph in theunicode Standard

Proposal to encode the Hanifi Rohingya script in Unicode

2. Processing. Imperial Aramaic is an alphabetic script written right-to-left, in scriptio continua or with spaces between words.

Follow-up to Extended Tamil proposal L2/10-256R. 1. Encoding model of Extended Tamil and related script-forms

The Persian Language and Arabic Script IDNs

The Unicode Standard Version 7.0 Core Specification

VISUAL STANDARDS GUIDE

The Unicode Standard Version 8.0 Core Specification

Houghton Mifflin English 2001 Houghton Mifflin Company Grade Three Grade Five

Review of Bengali Khanda Ta and PRI-30 Feedback

The Unicode Standard Version 11.0 Core Specification

ƒ Δ ~ ÀÃÕŒœ ÿÿ Ä Å Ç É Ñ Ö Ü á à â ä ã å ç èê ë í ì î ñ ó ô õ ß Ø ± π ª

Houghton Mifflin Harcourt Collections 2015 Grade 8. Indiana Academic Standards English/Language Arts Grade 8

Transcription ICANN London IDN Variants Saturday 21 June 2014

The Unicode Standard Version 8.0 Core Specification

The Unicode Standard Version 11.0 Core Specification

Chattha Sangayana CD. Dhananjay Chavan, Vipassana Research Institute, India

Xerox Research Center Europe. 25 April at the earliest opportunity to include four additional characters,

StoryTown Reading/Language Arts Grade 2

Prentice Hall U.S. History Modern America 2013

Everson Typography. 48B Gleann na Carraige, Cill Fhionntain Baile Átha Cliath 13, Éire. Computer Locale Requirements for Afghanistan TYPOGRAPHY

Pearson myworld Geography Western Hemisphere 2011

Gingko Library Submissions Guidelines for the BIPS Persian Studies Series

Prentice Hall United States History Survey Edition 2013

BE6601 Course Syllabus

Proposal to encode Vedic characters for the Grantha script. 1. Characters being proposed

Georgia Quality Core Curriculum 9 12 English/Language Arts Course: Ninth Grade Literature and Composition

StoryTown Reading/Language Arts Grade 3

1 The authors wish to acknowledge the support of the Universal Scripts Project (part of the

UNDERSTANDING UNBELIEF Public Engagement Call for Proposals Information Sheet

Verification of Occurrence of Arabic Word in Quran

Church of God Branding Guidelines

Some comments on the Arabic block in Unicode

Elaine Keown Fri, June 4, 2004 Tucson, Arizona

Arizona Common Core Standards English Language Arts Kindergarten

Minnesota Academic Standards for Language Arts Kindergarten

Proposal to Encode Shiva Linga Symbols in Unicode

Automatic Recognition of Tibetan Buddhist Text by Computer. Masami Kojima*1, Yoshiyuki Kawazoe*2 and Masayuki Kimura*3

THE PHYSICAL EVIDENCE

BE5502 Course Syllabus

The Unicode Standard Version 10.0 Core Specification

Dual-joining Manichaean Characters Character X n X r X m X l. Right-joining Manichaean Characters Character X n X r

Final Proposal to Encode the Khojki Script in ISO/IEC 10646

BE6603 Preaching and Culture Course Syllabus

2.1 Review. 2.2 Inference and justifications

Representation of Fractional Signs in Kannada script

South Carolina English Language Arts / Houghton Mifflin Reading 2005 Grade Three

Guidelines for registration of a Yoga Teacher Training Course

Northern Thai Stone Inscriptions (14 th 17 th Centuries)

Proceedings of the Meeting & workshop on Development of a National IT Strategy Focusing on Indigenous Content Development

Your instructor is available for correspondence. If you have a question about the course, you can contact your instructor via .

4. Shaping. Dual-joining Manichaean Characters Character Right-joining Manichaean Characters Character Left-joining Manichaean Characters Character

Prentice Hall United States History 1850 to the Present Florida Edition, 2013

Transcription:

ISO/IEC JTC1/SC2/WG2 N3976 L2/11-130R L2/12-012 (replaces L2/11-130) 2011-04-19 Universal Multiple-Octet Coded Character Set International Organization for Standardization Organisation internationale de rmalisation Международная организация по стандартизации Doc Type: Title: Source: Action: Date: Working Group Document Proposal to add mirity characters to Myanmar script Martin Hosken For consideration by JTC1/SC2/WG2 2011-05-23 Introduction This proposal is to add 28 extra characters to the Myanmar script for the Tai Laing and Shwe Palaung languages. The Tai Laing are a language group of about 100,000 speakers living along the Irrawaddy River in Myanmar. The writing system is part of their history that has t completely died out and there is interest in reviving it. While the script is t taught formally in schools, it is taught during school breaks. The Shwe Palaung have about 200,000 speakers, primarily in Shan State, Myanmar. 20% of Shwe Palaung are literate in Shwe Palaung with ongoing literacy development happening indigeusly. The orthography has been in development since the 1930s with the latest revision occurring in the 1980s when the proposed characters were added. Charts: The proposed characters are to be added to two existing Myanmar Extended blocks. This fills out Myanmar Extended-A and adds to Myanmar Extended-B. 1

U+AA70 Myanmar Extended-A AA7 0 AA7C MYANMAR SIGN TONE-2 AA7D MYANMAR SIGN TONE-5 AA7E ꩾ MYANMAR SHWE PALAUNG CHA AA7F 1 ꩿ MYANMAR SHWE PALAUNG SHA 2 3 4 5 6 7 8 9 A B C D E ꩾ F ꩿ 2

U+A9E0 Myanmar Extended-B Consonants ꧧ A9E8 ꧨ A9E9 ꧩ A9EA ꧪ A9EB ꧫ A9EC ꧬ A9ED ꧭ A9EE ꧮ ꧯ A9EF A9E A9F 0 A9E7 ꧰ 1 ꧱ 2 ꧲ 3 ꧳ 4 ꧴ 5 MYANMAR FA MYANMAR GA MYANMAR GHA MYANMAR JA MYANMAR JHA MYANMAR DDA MYANMAR DDHA MYANMAR NNA Digits A9F0 A9F1 ꧵ 6 MYANMAR NYA A9F2 A9F3 ꧶ A9F4 ꧧ ꧷ 8 ꧨ ꧸ A9F5 7 A9F6 A9F7 A9F8 A9F9 9 ꧩ ꧹ A ꧪ ꧺ A9FA B ꧫ ꧻ A9FC C ꧬ ꧼ A9FE D ꧭ ꧽ E ꧮ ꧾ F ꧯ ꧰ ꧱ ꧲ ꧳ ꧴ ꧵ ꧶ ꧷ ꧸ ꧹ MYANMAR ZERO MYANMAR ONE MYANMAR TWO MYANMAR THREE MYANMAR FOUR MYANMAR FIVE MYANMAR SIX MYANMAR SEVEN MYANMAR EIGHT MYANMAR NINE Consonants A9FB A9FD 3 ꧺ ꧻ ꧼ ꧽ ꧾ MYANMAR LLA MYANMAR DA MYANMAR DHA MYANMAR BA MYANMAR BHA

Rationale The various subgroups of characters will be considered separately, in encoding order. Tai Laing Tone Marks. Tai Laing has 5 tone marks. Of these, 3 are already encoded and this proposal adds the remaining two. Figure 1. Tai Laing tone marks As can be seen in the example above, the tone marks will position before an interacting vowel in the same diacritic space. The stored sequence, though, is still with the tone mark stored finally. With reference to UTN#11 Diacritic Storage Order, the tone marks are added to the Visarga slot class. AA7C;MYANMAR SIGN TONE-2;Mn;0;NSM;;;;;N;;;;; AA7D;MYANMER SIGN TONE-5;Mn;0;NSM;;;;;N;;;;; Ather example of the use of these characters: Figure 2. Use of Tai Laing tone marks in conjuction with other diacritics. Shwe Palaung Consonants: The two proposed characters for Shwe Palaung have a visual representation that is very close to an existing sequence: ꩾ AA7E ခ 1001 103B 103E ꩿ AA7F ဆ 1006 103B 103E The difficulty comes if we introduce a medial wa into the sequence: ꩾ AA7E 103D ခ 1001 103B 103D 103E ꩿ AA7F 103D ဆ 1006 103B 103D 103E The medial is added into the middle of the sequence representing the consonant. This is a problem for data entry. For sorting, while the example dictionary here sorts them as though the components are medials, being distinct consonants, these characters should have their own consonantal position. The encoding of these 4

characters follows in the tradition of encoding such characters atomically. Other examples are: U+106F (c.f. U+101F U+103E), U+1070 (c.f. U+1003 U+103E), U+107E (c.f. U+107D U+103E). AA7E;MYANMAR SHWE PALAUNG CHA;Lo;0;L;;;;;N;;;;; AA7F;MYANMAR SHWE PALAUNG SHA;Lo;0;L;;;;;N;;;;; It is also worth ting that the rendering behaviour with medial wa may be different to the way the sequence is rendered. Some styles prefer them to be the same, others for them to be different. Figure 3. Shwe Palaung consonant sorting. For IDN purposes, the sequence and the unit should be considered confusable. Tai Laing Consonants: The proposed characters are listed as part of the alphabet for Tai Laing. The proposed characters have been circled in figure 4, with all the other characters already supported in the UCS. Notice that the labelling of the sa and sʰa characters is wrong when compared with the Pali based shiksha. In addition the shape of the sʰa belies its underlying encoding of U+AA6C MYANMAR KHAMTI SA, as can be seen in figure 5. 5

Figure 4. Tai Laing modern alphabet A9E7;MYANMAR NYA;Lo;0;L;;;;;N;;;;; A9E8;MYANMAR FA;Lo;0;L;;;;;N;;;;; Tai Laing Pali Consonants: As with most Myanmar script based writing systems, Tai Laing adds character support for the Pali language. 6

Figure 5. Shiksha showing Devanagari, Burmese, Tai Laing and Roman scripts. Notice that the character for pha is the same as U+A9E4 MYANMAR SHAN BHA and that likewise the code for bha is based on that shape. 7

A9E9;MYANMAR A9EA;MYANMAR A9EB;MYANMAR A9EC;MYANMAR A9ED;MYANMAR A9EE;MYANMAR A9EF;MYANMAR GA;Lo;0;L;;;;;N;;;;; GHA;Lo;0;L;;;;;N;;;;; JA;Lo;0;L;;;;;N;;;;; JHA;Lo;0;L;;;;;N;;;;; DDA;Lo;0;L;;;;;N;;;;; DDHA;Lo;0;L;;;;;N;;;;; NNA;Lo;0;L;;;;;N;;;;; A9FA;MYANMAR A9FB;MYANMAR A9FC;MYANMAR A9FD;MYANMAR A9FE;MYANMAR LLA;Lo;0;L;;;;;N;;;;; DA;Lo;0;L;;;;;N;;;;; DHA;Lo;0;L;;;;;N;;;;; BA;Lo;0;L;;;;;N;;;;; BHA;Lo;0;L;;;;;N;;;;; Tai Laing Digits: Tai Laing has its own set of digits, which are proposed here: Figure 6. Tai Laing Digits. The colums are: character name in Tai Laing, Tai Laing digit, Shan digit (from U+1090U+1099), Arabic digit. A9F0;MYANMAR A9F1;MYANMAR A9F2;MYANMAR A9F3;MYANMAR A9F4;MYANMAR A9F5;MYANMAR A9F6;MYANMAR A9F7;MYANMAR ZERO;Nd;0;L;;0;0;0;N;;;;; ONE;Nd;0;L;;1;1;1;N;;;;; TWO;Nd;0;L;;2;2;2;N;;;;; THREE;Nd;0;L;;3;3;3;N;;;;; FOUR;Nd;0;L;;4;4;4;N;;;;; FIVE;Nd;0;L;;5;5;5;N;;;;; SIX;Nd;0;L;;6;6;6;N;;;;; SEVEN;Nd;0;L;;7;7;7;N;;;;; 8

A9F8;MYANMAR EIGHT;Nd;0;L;;8;8;8;N;;;;; A9F9;MYANMAR NINE;Nd;0;L;;9;9;9;N;;;;; Sort Order The default sort order is integrated into the existing default sorting for the Myanmar script. Given that all Myanmar based languages require complex sort tailoring, the precise values here can be somewhat arbitrary. The sort order information given here also includes the other characters from the Myanmar Extended-B block as specified in N3906 (L2/10-345). &1003 &1006 &AA62 &105B &AA64 &1061 &AA65 &106E &108E &1018 &1068 &108D &1089 &AA70 &AA60 &105B &AA68 &AA69 &100F &1020 &107B &1013 &107F MYANMAR GHA < A9E0 MYANMAR SHAN GHA < A9EA MYANMAR GHA MYANMAR CHA < A9E1 MYANMAR SHAN CHA MYANMAR KHAMTI CHA < AA7E MYANMAR SHWE PALAUNG CHA MYANMAR MON JHA < A9E2 MYANMAR SHAN JHA MYANMAR KHAMTI JHA < A9EC MYANMAR JHA MYANMAR SGAW KAREN SHA < AA7F MYANMAR SHWE PALAUNG SHA MYANMAR KHAMTI NYA < A9E7 MYANMAR NYA MYANMAR EASTERN PWO KAREN NNA < A9E3 MYANMAR SHAN NNA < A9EE MYANMAR NNA MYANMAR RUMAI PALAUNG FA < A9E8 MYANMAR FA MYANMAR BHA < A9E4 MYANMAR SHAN BHA < A9FE MYANMAR BHA MYANMAR VOWEL SIGN WESTERN PWO KAREN UE < A9E5 MYANMAR SIGN SHAN SAW MYANMAR SIGN SHAN COUNCIL EMPHATIC TONE << AA7C MYANMAR SIGN TONE-2 MYANMAR SIGN SHAN TONE-5 < AA7D MYANMAR SIGN TONE-5 MYANMAR MODIFIER KHAMTI REDUPLICATION < A9E6 MYANMAR MODIFIER SHAN REDUPLICATION MYANMAR KHAMTI GA < A9E9 MYANMAR GA MYANMAR MON JA < A9EB MYANMAR JA MYANMAR KHAMTI DDA < A9ED MYANMAR DDA MYANMAR KHAMTI DDHA < A9EE MYANMAR DDHA MYANMAR NNA < A9EF MYANMAR NNA MYANMAR LLA < A9FA MYANMAR LLA MYANMAR SHAN DA < A9FB MYANMAR DA MYANMAR DHA < A9FC MYANMAR DHA MYANMAR SHAN BA < A9FD MYANMAR BA MYANMAR characters are sorted following their corresponding MYANMAR SHAN characters. Bibliography Hosken, Martin Representing Myanmar in Unicode (Unicode Technical Note 11, version 3). ၸꩫ င ဝ SonNgaw ꧤ ꩫ လ တ လ င Phawnla TaiLaing O Thuwa Palaung-Burmese Dictionary (Namhfan, 2003) Ackwledgements Thanks go to Payap University Linguistics Institute, Chiang Mai, Thailand, under whose auspices this work is done. 9

ISO/IEC JTC 1/SC 2/WG 2 PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC 10646 1 Please fill all the sections A, B and C below. TP PT Please read Principles and Procedures Document (P & P) from http://www.dkuug.dk/jtc1/sc2/wg2/docs/principles.html for guidelines and details before filling this form. Please ensure you are using the latest Form from http://www.dkuug.dk/jtc1/sc2/wg2/docs/summaryform.html See also http://www.dkuug.dk/jtc1/sc2/wg2/docs/roadmaps.html for latest Roadmaps. HTU UTH HTU UTH HTU UTH A. Administrative 1. Title: Lao Extensions 2. Requester's name: Martin Hosken 3. Requester type (Member body/liaison/individual contribution): 4. Submission date: 5. Requester's reference (if applicable): 6. Choose one of the following: This is a complete proposal: (or) More information will be provided later: Individual contribution 21/04/11 X B. Technical General 1. Choose one of the following: a. This proposal is for a new script (set of characters): Proposed name of script: b. The proposal is for addition of character(s) to an existing block: X Name of the existing block: Myanmar Extended-A, Myanmar Extended-B 2. Number of characters in proposal: 28 3. Proposed category (select one from below - see section 2.2 of P&P document): A-Contemporary X B.1-Specialized (small collection) B.2-Specialized (large collection) C-Major extinct D-Attested extinct E-Mir extinct F-Archaic Hieroglyphic or Ideographic G-Obscure or questionable usage symbols 4. Is a repertoire including character names provided? a. If YES, are the names in accordance with the character naming guidelines in Annex L of P&P document? b. Are the character shapes attached in a legible form suitable for review? 5. Who will provide the appropriate computerized font (ordered preference: True Type, or PostScript format) for publishing the standard? SIL If available w, identify source(s) for the font (include address, e-mail, ftp-site, etc.) and indicate the tools used: 6. References: a. Are references (to other character sets, dictionaries, descriptive texts etc.) provided? b. Are published examples of use (such as samples from newspapers, magazines, or other sources) of proposed characters attached? 7. Special encoding issues: Does the proposal address other aspects of character data processing (if applicable) such as input, presentation, sorting, searching, indexing, transliteration etc. (if please enclose information)? 8. Additional Information: Submitters are invited to provide any additional information about Properties of the proposed Character(s) or Script that will assist in correct understanding of and correct linguistic processing of the proposed character(s) or script. Examples of such properties are: Casing information, Numeric information, Currency information, Display behaviour information such as line breaks, widths etc., Combining behaviour, Spacing behaviour, Directional behaviour, Default Collation behaviour, relevance in Mark Up contexts, Compatibility equivalence and other Unicode rmalization related information. See the Unicode standard at http://www.unicode.org for such information on other scripts. Also see http://www.unicode.org/public/unidata/ucd.html and associated Unicode Technical Reports for information needed for consideration by the Unicode Technical Committee for inclusion in the Unicode Standard. HTU 1 UTH HTU UTH Form number: N3102-F (Original 1994-10-14; Revised 1995-01, 1995-04, 1996-04, 1996-08, 1999-03, 2001-05, 2001-09, 2003-11, 2005-01, 2005-09, 2005-10, 2007-03) TPPT 10

C. Technical - Justification 1. Has this proposal for addition of character(s) been submitted before? If YES explain 2. Has contact been made to members of the user community (for example: National Body, user groups of the script or characters, other experts, etc.)? local experts If YES, with whom? see bibliography If YES, available relevant documents: 3. Information on the user community for the proposed characters (for example: size, demographics, information techlogy use, or publishing use) is included? this document Reference: 4. The context of use for the proposed characters (type of use; common or rare) common Reference: 5. Are the proposed characters in current use by the user community? see bibliography If YES, where? Reference: 6. After giving due considerations to the principles in the P&P document must the proposed characters be entirely in the BMP? If YES, is a rationale provided? addition to existing BMP blocks 7. Should the proposed characters be kept together in a contiguous range (rather than being scattered)? 8. Can any of the proposed characters be considered a presentation form of an existing character or character sequence? If YES, is a rationale for its inclusion provided? this document 9. Can any of the proposed characters be encoded using a composed character sequence of either existing characters or other proposed characters? If YES, is a rationale for its inclusion provided? 10. Can any of the proposed character(s) be considered to be similar (in appearance or function) to an existing character? If YES, is a rationale for its inclusion provided? this document 11. Does the proposal include use of combining characters and/or use of composite sequences? If YES, is a rationale for such use provided? this document Is a list of composite sequences and their corresponding glyph images (graphic symbols) provided? 12. Does the proposal contain characters with any special properties such as control function or similar semantics? If YES, describe in detail (include attachment if necessary) 13. Does the proposal contain any Ideographic compatibility character(s)? If YES, is the equivalent corresponding unified ideographic character(s) identified? 11