MOVING TO A UNICODE-BASED LIBRARY SYSTEM: THE YESHIVA UNIVERSITY LIBRARY EXPERIENCE

Similar documents
Responses to Several Hebrew Related Items

National Library of Israel Bibliographic Projects at the NLI Elhanan Adler

The Letter Alef Is The First Letter Of The Hebrew

This document requests an additional character to be added to the UCS and contains the proposal summary form.

Issues in the Representation of Pointed Hebrew in Unicode

Proposal to encode Al-Dani Quranic marks used in Quran published in Libya. For consideration by UTC and ISO/IEC JTC1/SC2/WG2

StoryTown Reading/Language Arts Grade 3

Proposal to Encode Alternative Characters for Biblical Hebrew

The Letter Alef Is The First Letter Of The Hebrew

Tips for Using Logos Bible Software Version 3

StoryTown Reading/Language Arts Grade 2

Proposal to encode svara markers for the Jaiminiya Archika. 1. Background

Proposal to Encode the Typikon Symbols in Unicode: Part 2 Old Rite Symbols

Proposal to add two Tifinagh characters for vowels in Tuareg language variants

Transcription ICANN London IDN Variants Saturday 21 June 2014

Carolina Bachenheimer-Schaefer, Thorsten Reibel, Jürgen Schilder & Ilija Zivadinovic Global Application and Solution Team

Response to the Proposal to Encode Phoenician in Unicode. Dean A. Snyder 8 June 2004

This is a preliminary proposal to encode the Mandaic script in the BMP of the UCS.

ISO/IEC JTC1/SC2/WG2 N3816

This document requests an additional character to be added to the UCS and contains the proposal summary form.

L2/ Background. Proposal

The Urantia Book Search Engine

The Alphabet Mark Francois 1. Hebrew Grammar. Week 1 (Last Updated Nov. 28, 2016)

Learn step by step how to download YouTube videos

The Unicode Standard Version 11.0 Core Specification

Chapter 1 The Hebrew Alphabet (Alef-Bet)

Okay, ladies and gentlemen. We re going to start in a couple of minutes. Please take your seats. Thank you all for coming.

Alef. The Alphabet is Just the Consonants. Chapter 1 The Hebrew Alphabet (Alef-Bet)

FIGURE The SIFRA Compendium. AWRD Tools menu option. Open Introduction of SIFRA. Open SIFRA File for Specific Country

Data Sharing and Synchronization using Dropbox

APRIL 2017 KNX DALI-Gateways DG/S x BU EPBP GPG Building Automation. Thorsten Reibel, Training & Qualification

Summary. Background. Individual Contribution For consideration by the UTC. Date:

Verification of Occurrence of Arabic Word in Quran

The Unicode Standard Version 8.0 Core Specification

Joan C. Biella a & Heidi G. Lerner b a Library of Congress, Washington, DC, USA. Available online: 17 Nov 2011

Report on the Digital Tripitaka Koreana 2001

Request to encode South Indian CANDRABINDU-s. Shriramana Sharma, jamadagni-at-gmail-dot-com, India 2010-Oct Background

Review of Bengali Khanda Ta and PRI-30 Feedback

The Unicode Standard Version 7.0 Core Specification

Proposal to encode Grantha Chillu Marker sign in Unicode/ISO 10646

SHLC: Introduction to Biblical Hebrew

Houghton Mifflin English 2004 Houghton Mifflin Company Level Four correlated to Tennessee Learning Expectations and Draft Performance Indicators

Everson Typography. 48B Gleann na Carraige, Cill Fhionntain Baile Átha Cliath 13, Éire. Computer Locale Requirements for Afghanistan TYPOGRAPHY

Bibles Online in different languages BIBLES.NET. is a non-profit ministry. please go to this website. to access the information below

Proposal to Encode the Typikon Symbols in Unicode

Yogafont. YOGAFONT Ver 3.0 FOR WINDOWS

THE EARLY CHRISTIAN WORLD (ROUTLEDGE WORLDS) BY PHILIP F. ESLER

Assignments. HEBR/REL-131 &132: Elementary Biblical Hebrew I, Spring Charles Abzug. Books and Other Source Materials for the Assignments:

LDS Church Resources by Brett W. Smith

OPENRULES. Tutorial. Determine Patient Therapy. Decision Model. Open Source Business Decision Management System. Release 6.0

Briefly, the chronology of events leading up to this pastoral plan are as follows:

Typographic Concerns and the Hebrew Nomina Sacra

TOWARDS UNICODE STANDARD FOR URDU - WG2 N2413-1/SC2 N35891

ELEMENTARY SERIES APPROVED SERIES

THE MACLELLAN FAMILY FOUNDATIONS: FOUNDATION RESOURCE

Old Testament Prophets: Ezekiel Course Syllabus, OT 6305(e) Fall Office Hours: Mon., Tues., Thurs. 10:00 12:00 PM; Wed. 1:00 3:00 PM.

Correlates to Ohio State Standards

Houghton Mifflin English 2001 Houghton Mifflin Company Grade Four. correlated to. IOWA TESTS OF BASIC SKILLS Forms M Level 10

Introduction to Biblical Hebrew

Assignments. HEBR/REL-131 &132: Elementary Biblical Hebrew I, Spring Charles Abzug. Books and Other Source Materials for the Assignments:

HEBREW VOWELS. A Brief Introduction. Alan Smith. Elibooks

English Chant Psalter

Asanas: 608 Yoga Poses By Dharma Mittra

A Correlation of. To the. Language Arts Florida Standards (LAFS) Grade 4

NIGHT'S CORRIDOR: HOW TO USE YOUR DREAMS FOR GUIDANCE, HOPE, AND POWER BY CINDY MCGILL

The Unicode Standard Version 8.0 Core Specification

The Unicode Standard Version 11.0 Core Specification

Houghton Mifflin English 2001 Houghton Mifflin Company Grade Three. correlated to. IOWA TESTS OF BASIC SKILLS Forms M Level 9

John. Subject: RE: News release

Schema for the Transliteration of Sanskrit and Pāḷi

Online Mission Office Database Software

THE THEOLOGY COLLECTIONS OF THE UNIVERSITY OF FLORIDA

The new ecumenism: Exploration of a DDC/UDC view of religion

Arkansas English Language Arts Standards

Tom And Jerry Annual By Sophy Gasson

The/My Philosophy of Romanization

Old Slavonic and Church Slavonic in TEX and Unicode

The Directory for Worship: From the Sanctuary to the Street A Study Guide* for the Proposed Revision

INTERMEDIATE LOGIC Glossary of key terms

LDS Records Exercise

JTC2/SC2/WG2 N 2190 Date:

Academic Modern Hebrew Vocabulary List

The Persian Language and Arabic Script IDNs

Proposal to encode Quranic marks used in Quran published in Libya (Narration of Qaloon with script Aldani)

1. Read, view, listen to, and evaluate written, visual, and oral communications. (CA 2-3, 5)

Assisting Missionaries Worldwide!

NEOPOST POSTAL INSPECTION CALL E-BOOK

Jewish Theological Seminaries and Their Libraries

TRANSCRIPT. Contact Repository Implementation Working Group Meeting Durban 14 July 2013

DOWNLOAD OR READ : THE ACCESS BIBLE NEW REVISED STANDARD VERSION WITH APOCRYPHA PAPERBACK 9872A PDF EBOOK EPUB MOBI

A Step-By-Step Guide To Shiatsu: An Easy-to-follow Illustrated Manual For The Ancient Japanese System Of Therapeutic Pressure For Health And Well

Summary of Registration Changes

Mastering Hebrew Calligraphy PDF

This title is also available at major online book retailers. Copyright 2011 Dr. Adam Yacoub All rights reserved.

RootsWizard User Guide Version 6.3.0

A Dictionary Of Jewish Names And Their History By Benzion C. Kaganoff

Houghton Mifflin ENGLISH Grade 5 correlated to West Virginia Instructional Goals and Objectives

Computer Translation of the Chinese Taisho Tripitaka

NEWMAN BIBLE ACADEMY ONLINE COURSE SYLLABUS By Dr. Willis C. Newman

Do we personally have the qualities of mind, heart, and spirit to take up this task?

Transcription:

MOVING TO A UNICODE-BASED LIBRARY SYSTEM: THE YESHIVA UNIVERSITY LIBRARY EXPERIENCE By: Leah Adler Description: When Yeshiva University Library moved from a non-unicode automated library system to a Unicode-based system, it found itself faced with many issues, some of which were anticipated, some of which were not. The Hebrew alphabet and diacritics in the Roman alphabet posed special problems. Leah Adler is Head Librarian of the Mendel Gottesman Library of Yeshiva University and also serves as YU's Systems Librarian She was the In January of 2003, the Yeshiva University Libraries migrated their VTLS Integrated Library System from the so-called Classic VTLS to the Unicode-based Virtua system. At last year s convention I reported on the move in general. In today s talk I want to concentrate on some Unicode related aspects of the move. On the Unicode home page (www.unicode.org) we find the following quote: Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language. A Unicode-based library system provides unique characters not only for many different languages and scripts, but also for any technical symbol and sign you would possibly need in a written document. Let s look at the Microsoft Unicode Character Map. Proceedings of the 39 th Annual Convention of the Association of Jewish Libraries (Brooklyn, NY June 20-23, 2004) 1

(Internet Explorer -> Start -> Programs -> Accessories -> System Tools -> Character Map) Here you see, besides the Latin characters, for example the Question Mark, the Inverted Question Mark (used in Spanish language documents), the Registered Sign and the Pound Sign, and on and on. Each character, as quoted before, has its unique number. We have stand-alone diacritics, like the Tilde, and prearranged combinations of letters with diacritical marks, like the Latin Small Letter N with Tilde. Now think romanization of Hebrew: Besides regular Latin characters we use Latin characters combined with diacritical marks, like k with dot below, which represents the Hebrew letter Kof. In pre-unicode systems we had to use multiple characters in order to represent letters with diacritical marks. For the romanized Kof we used a dot followed by a k, or a squiggly bracket followed by a dot followed by a k, or we left the dot out altogether, just using a plain k, because the combination seemed too unsightly. In a Unicode-based system, a letter with a diacritical mark is represented by one character, like k with dot below. Therefore, a system which moves from non- Unicode to Unicode based representation has to translate the multi-character combinations into single characters. A squiggly bracket followed by a dot followed by a k will become the single character ḳ (k with dot below). This is doable. However, there is a caveat: In many cases Unicode allows for more than one way to represent a combined character. Let s again take the k with dot below as an example and let s look at the Character Map: (Look at Unicode Subrange Latin, half way down). Here is the pre-combined k with dot below, number U+1E33. Another way to represent the k with dot below is Proceedings of the 39 th Annual Convention of the Association of Jewish Libraries (Brooklyn, NY June 20-23, 2004) 2

to take a regular k and combine it with a combination character dot below, which is number U+0323. These two ks with dot below are both Unicode characters, but they are not the same character; they do not have the same Unicode number, and therefore they may not cluster in your database. In order to demonstrate what I mean, I created two short records in our database, using my grandson s name in romanization. His name is Yehudah Brofsḳy. I entered his name once with the pre-composed k with dot below, and once with the postcomposed k with dot below. The system does not cluster the two forms of the name. So, talk to your system s vendor before he migrates your database to a Unicodebased system. Ask him which Unicode characters he intends to use, so you will know which ones to use in the future. Tell him to make sure that your entries in the migrated database will cluster with records you will be importing from the utilities, like OCLC and RLIN. Or even better: Talk to him about having the system save different Unicode numbers which represent characters of the same visual typography (as we saw in k with dot below) in the same form. This is called Normalization. And another point: Discuss with your vendor the best way for you to enter Unicode characters after the migration. Ask him to provide you with an easy way to enter these characters. Copying and pasting them from the character map is not an easy way. Let s move the discussion to the Hebrew character set: (see Character Map, Unicode Subrange, Hebrew): At the top you see the המקרא,טעמי cantillation marks, or, in Unicode parlance, Hebrew Accents. These are followed by the vowel points, and by the Hebrew letters. You also find Hebrew letters with,נקודות degeshim and other marks. Of special interest to us are the Yiddish digraphs Double Vav, Vav-Yod, and Double Yod, as well as the Geresh and the Gershayim. Of all the many Unicode Proceedings of the 39 th Annual Convention of the Association of Jewish Libraries (Brooklyn, NY June 20-23, 2004) 3

characters that VTLS used when they migrated our database to Virtua, these were the ones that posed the more serious problems. You have to remember that Yeshiva was Virtua s Hebrew guinea pig, so some pitfalls were to be expected. In my presentation last year I mentioned these characters and the problems associated with them, but I hope that you don t mind if I repeat some of it today. It will hopefully help some of you when you migrate to Unicode. Yiddish digraphs: Yiddish digraphs as you see them in the Character Map comprise one Unicode character each. In Classic VTLS, Yeshiva s pre-unicode library system, Yiddish digraphs, when imported from RLIN, did not display legibly and were therefore immediately spotted and manually converted into two separate letters each. In Unicode-based Virtua, Yiddish digraphs display beautifully and are indistinguishable from two separate letters. Good news? Not necessarily. I created a short record entitled מיר.ביי In the word ביי I used the Unicode character for Yiddish Double Yod. Let s search the title in our catalog the way a user would, using the Hebrew keyboard. No ביי מיר to be found. The system does not treat the Unicode character for Yiddish Double Yod as two separate Yods and therefore does not file the Double Yod under Yod but rather at the end of the Hebrew alphabet, which in our case is the end of the Bet sequence. Geresh and Gershayim: These are the Hebrew Unicode equivalents of the apostrophe and the quotation.רמב "ם or ר ' like mark, which are used in words Look what happened in our database after we downloaded responsa, or titles, from RLIN: We have a split used in the word שו "ת title -שו file! In the first file quotation marks are "ת and in the second file Gershayim are used. Gershayim are,שו "ת not treated by Virtua as quotation marks, which causes the split file. After this presentation our staff will manually replace the Gershayim in each with a quotation mark, and we will get rid of the split file. heading שו ת" Proceedings of the 39 th Annual Convention of the Association of Jewish Libraries (Brooklyn, NY June 20-23, 2004) 4

We at Yeshiva have a special program going: cataloging Ladino books. Ladino, like Yiddish, is written in Hebrew characters. Ladino script does not use digraphs, but it does use very many apostrophes embedded within words. In records imported from RLIN, the character used as an apostrophe is usually the Geresh character. Again, Virtua doesn t treat Geresh as an apostrophe. Let s search for the title ' ואן.ז It is not there. But when we search at the end of the ז file, we find it. Once we replace the Geresh in appropriate place. with an apostrophe, the title will file in its ז ' ואן So, before you migrate to a Unicode-based system, think of all the special Unicode characters that you may have in your database and point them out to your vendor, so that he will take them into account when he prepares your system for Unicode. And should you discover problems after migration, as in our case, it s not too late. Reprogramming and reindexing can and should be done. VTLS is planning to reprogram Virtua in a way that will make the software treat Yiddish digraphs as if they were two separate letters, and the Geresh and Gershayim as if they were apostrophes and quotation marks. This is called Folding. Let me read to you a paragraph from the Committee on Cataloging : Description and Access, Library of Congress Liaison Report to: ALA/ALCTS/CCS/CC:DA, Annual Meeting, June 2002; Submitted by Barbara B. Tillett, LC Liaison to ALA/ALCTS/CCS/CC:DA Normalization and folding rules. A group at LC headed by NDMSO is in the process of revising existing normalization and folding rules for Latin script based cataloging data, as well as drafting for the first time rules for other scripts including, Arabic, Chinese, Cyrillic, Greek, Hebrew, Japanese, and Korean. Proceedings of the 39 th Annual Convention of the Association of Jewish Libraries (Brooklyn, NY June 20-23, 2004) 5

Normalization involves insuring that various encodings of modified letters are consistent. [In Yeshiva s case: k with dot below should display, sort and index consistently regardless of its Unicode numerical value.] Folding involves replacing modified letters or special characters with unmodified or simplified forms for certain activities, such as indexing (for example, an Ae@ ligature might be replaced with the normal letters "a" and "e".) [In Yeshiva s case: Yiddish digraphs will be replaced with two separate letters.] The folding process will not physically replace one character with another, but will give the special character sorting and indexing properties of another, simple character. The paragraph continues: LC plans to make its conclusions available to others for the Unicode-based versions of their software. Endeavor will use them for the new version of Voyager. LC s normalization and folding rules should be available via the Web later this year. Unfortunately I don t know whether LC s plans materialized. At this convention we are lucky to have with us key people from the LC Hebraica cataloging team. I hope they can shed some light on the progress of this proposal. Proceedings of the 39 th Annual Convention of the Association of Jewish Libraries (Brooklyn, NY June 20-23, 2004) 6