Parts of Speech

Similar documents
Hebrew Ulpan HEB Young Judaea Year Course in Israel American Jewish University College Initiative

Exercises Introduction to morphosyntax

TEXT MINING TECHNIQUES RORY DUTHIE

Information Extraction. CS6200 Information Retrieval (and a sort of advertisement for NLP in the spring)

GRAMMAR IV HIGH INTERMEDIATE

John Benjamins Publishing Company

Adverb Clause. 1. They checked their gear before they started the climb. (modifies verb checked)

Reference Resolution. Regina Barzilay. February 23, 2004

Reference Resolution. Announcements. Last Time. 3/3 first part of the projects Example topics

By the Time Viewing relative progress or completion

Note: NEW = teachers should expect the grammar point to be new to most students at that level who have followed the ELI curriculum.

I. PATTERNS OF CONNECTION

Table of Contents 1-30

How to Use the Subjunctive Mood

Anaphora Resolution in Hindi Language

VERBAL TENSES REVIEW. Present

Birmingham Theological Seminary 2200 Briarwood Way Birmingham, Alabama COURSE PURPOSE. Objectives of the Course

Anaphora Resolution in Biomedical Literature: A

Houghton Mifflin English 2004 Houghton Mifflin Company Level Four correlated to Tennessee Learning Expectations and Draft Performance Indicators

TURCOLOGICA. Herausgegeben von Lars Johanson. Band 98. Harrassowitz Verlag Wiesbaden

Rules Game (through lesson 30) by Nancy Decker Preparation: 1. Each rule board is immediately followed by at least three cards containing examples of

9 th Grade English Placement Test

Unit 6 Transformation of Sentences

Outline of today s lecture

CHAPTER III RESEARCH METHOD. source, data collection, subject of the research, and data analysis.

SB=Student Book TE=Teacher s Edition WP=Workbook Plus RW=Reteaching Workbook 47

E [Type text] [Type text]

Introduction to Koiné Greek

Camp Scholarship 2018 March 4- May 6 (9wks)

English Language for Competitive Exams Prof. Aysha Iqbal Department of Humanities and Social Science Indian Institute of Technology, Madras

Lesson 7: Pain. In today's chapters Jonas receives painful memories from The Giver. How do you think he will respond to these memories?

3. Negations Not: contradicting content Contradictory propositions Overview Connectives

Question Answering. CS486 / 686 University of Waterloo Lecture 23: April 1 st, CS486/686 Slides (c) 2014 P. Poupart 1

ELA CCSS Grade Three. Third Grade Reading Standards for Literature (RL)

Anaphora Resolution. Nuno Nobre

4.7 Constructing Categorical Propositions

ESL 340: Noun Clauses. Week 5, Thur. 2/15/18 Todd Windisch, Spring 2018

ASSEMBLIES OF GOD THEOLOGICAL SEMINARY BGR 611 INDUCTIVE STUDIES IN THE GREEK NEW TESTAMENT. Professor: James D. Hernando Fall, 2008.

Front Range Bible Institute

What is an Argument? Validity vs. Soundess of Arguments

B. Key Question: What does the text say or What do I see

Subject Index. Index

Houghton Mifflin English 2001 Houghton Mifflin Company Grade Three. correlated to. IOWA TESTS OF BASIC SKILLS Forms M Level 9

If I hadn t studied as much as I did, I wouldn t have passed my exams.

Houghton Mifflin English 2001 Houghton Mifflin Company Grade Four. correlated to. IOWA TESTS OF BASIC SKILLS Forms M Level 10

BOOK 1 OF PLATO S REPUBLIC: A WORD BY WORD GUIDE TO TRANSLATION (VOL 2: CHAPTERS 13 24) BrownWalker.com

Answer Key Writing Strategies, Book 1 Second Edition 2018 Copyright 2018 by David Kehe and Peggy Dustin Kehe ISBN

What is infinitival to?

The Epistle of James to the Twelve Tribes of the Diaspora. Contextual Analysis:

Automatic Evaluation for Anaphora Resolution in SUPAR system 1

, and Imperfect Verbs

CONTENTS. Acknowledgment & Dedication...4 Teacher Notes...5 Rules About Sentences...6

International Messianic Torah Institute

Use a comma to separate items words, phrases, or clauses in a series. Place a comma after each item in the list except the final one.

Dialogue structure as a preference in anaphora resolution systems

Brainstorming exercise

Pronominal, temporal and descriptive anaphora

A Correlation of. To the. Language Arts Florida Standards (LAFS) Grade 3

Houghton Mifflin English 2004 Houghton Mifflin Company Grade Six. correlated to. TerraNova, Second Edition Level 16

Copyright 2010 Pearson Canada Inc., Toronto, Ontario.

A Short Addition to Length: Some Relative Frequencies of Circumstantial Structures

Writing with Concord: Parallel Structure

Grammar I. Determiners. Bradius V. Maurus III

Identifying Anaphoric and Non- Anaphoric Noun Phrases to Improve Coreference Resolution

Prentice Hall Literature: Timeless Voices, Timeless Themes, Silver Level '2002 Correlated to: Oregon Language Arts Content Standards (Grade 8)

Prentice Hall Literature: Timeless Voices, Timeless Themes, Bronze Level '2002 Correlated to: Oregon Language Arts Content Standards (Grade 7)

Title of Unit Plan: A Study of St. Patrick

AMAZIGH PART-OF-SPEECH TAGGING USING MARKOV MODELS AND DECISION TREES

A Typology of Clause Combining

Course s Main Goals and Workflow Overview

15 DEPENDENT CLAUSES. 1 Note that other alternatives than those shown here may be possible:

Houghton Mifflin English 2004 Houghton Mifflin Company Grade Five. correlated to. TerraNova, Second Edition Level 15

SORRY, IT AIN T STYLE

כ"ג אלול תשע"ו - 26 ספטמבר, 2016 Skills Worksheet #2

Unit Outline Time Content Classical Strategies/ Instruction

PAGE(S) WHERE TAUGHT (If submission is not text, cite appropriate resource(s))

Sri Lanka International Buddhist Academy (SIBA) Department of Buddhist Studies Diploma in Pali

STUDY QUESTIONS. 1. What NT verse tells us we need to interpret the Bible correctly? (1)

Everyone, anyone, someone, nobody, each, much, one, neither, and either are considered plural. A)True B) False

Final Exam due on December 13, 2001

Topics to be discussed in this week

do not when the train leaves what her name is. what I write who I'm talking to

Correlation to Georgia Quality Core Curriculum

Infinitives, gerunds, participles

1 Clarion Logic Notes Chapter 4

Complete the following questions, after reading the passage below.

ELLIPTICAL CONSTRUCTION PASSIVE

Notes for Living by the Book

Categories and On Interpretation. Philosophy 21 Fall, 2004 G. J. Mattey

Arkansas English Language Arts Standards

Correlates to Ohio State Standards

PRESENT SIMPLE PRESENT CONTINUOUS PAST SIMPLE PRESENT PERFECT SIMPLE. ANGLEŠČINA slovnica. The Present Simple is used to talk about:

A Machine Learning Approach to Resolve Event Anaphora

Why Study Syntax? Chapter 23 Lecture Roadmap. Clause vs. Sentence. Chapter 23 Lecture Roadmap. Why study syntax?

ACTIVE & PASSIVE VOICE

GENERAL EDUCATION AND TRAINING

Department of Arabic

Annotation of negation cues and their scope Guidelines v1.0. Roser Morante, Sarah Schrauwen and Walter Daelemans

Polishing Our Hermeneutical Glasses Section 8 Useful Terms for The Study of Hermeneutics

Identifying Clauses. Clauses

Transcription:

Parts of Speech 1

חלקי-הדיבור מקובל למנות 9~ קבוצות מילים המכונות "חלקי- דיבר": שם עצם ( noun ),שם תואר,(adjective) כינוי ( pronoun ),שם מספר ( numeral ),פועל,(verb) תואר הפועל ( adverb ),מלת יחס,(preposition) מלת חיבור ( conjunction ),מלת קריאה.(interjection) אך זו רק חלוקה אחת 2

למה זה טוב? parsing term identification/chunking בסיס לניתוח מציאת ביטויים יצירת קול (TTS) אופן הביטוי של המילה: רכבת/רכבת CONtent/conTENT, OBJect/objECT, DIScount/disCOUNT book a flight book about flight בעברית ניקוד אוטומטי רכיב בתרגום רכיב בחיפוש: 3

איך מגדירים חלקי דיבר? באופן מסורתי, ההגדרה של חלקי הדיבר מבוססת על תכונות מורפולוגיות של המילה או על המילים שמופיעות לידן בסמיכות.distributional properties באופן עקרוני, יש למילים מאותו חלק דיבר דמיון סמנטי, כלומר, הן מתארות איברים מאותן קבוצות למשל שמות עצם nouns אנשים, מקומות, דברים sister thought, table, שמות תואר adjectives תכונות, כמויות big, lazy לואי פעולה adverbs מתארים אופן, מקום, זמן, איכות quickly פעלים אירועים, התרחשויות או מצבי קיום write eat, is, ויש גם מילות יחס, מילות איחוי ועוד... 4

The yinkish dripner blorked quastocally into the nindin with the pidibs. 5

The yinkish dripner blorked quastocally into the nindin with the pidibs. yinkish -adj nindin -noun dripner -noun pidibs -noun blorked -verb quastocally -adverb 6

The yinkish dripner blorked quastocally into the nindin with the pidibs. yinkish -adj nindin -noun dripner -noun pidibs -noun blorked -verb quastocally -adverb We determine the P.O.S of a word by the a!xes that are attached to it and by the syntactic context (where in the sentence) it appears in. 7

Open class vs. Closed class types Closed class: small group, does not (usually) grow function words, determiners, prepositions, pronouns,... Open class: large group, and grows larger verbs, nouns, adjectives productive group: to google, to fax, googling 8

שמות עצם Nouns take -s, 's, -ness, -ment, -er, a!xes Occur with determiners (a,the,this,some ) can be a subject of a sentence. Semantically: can be concrete chair, train, or abstract relationship. 9 או שמות פעולה, למשל: אכילה, לאכול, eating

Types of Nouns Proper Nouns: David, Israel, Microsoft Aren t preceded by articles Capitalized (In English) Common Nouns: Count Nouns: allow grammatical enumeration (book, books) can be counted (one apple, 50 thoughts) Mass Nouns: snow, salt, communism, 10

Verbs מילים המתייחסות לפעולות או תהליכים Main verbs draw, provide, differ Auxiliaries (usually considered closed class) מערכת הטיה מורפולוגית eat, eats, eating, eaten 11

Adjectives מבחינה סמנטית, קבוצה הכוללת ביטויים המתארים תכונות או איכויות, משהו כמו פרדיקט חד-מקומי. שפות רבות כוללות: צבעים green) (yellow, גילאים old) (young, וערכים. (good, bad) יש שפות בלי שמות תואר. 12

Adverbs קבוצה מעורבת למדי... Unfortunately, John walked home extremely slowly yesterday Directional: sideways, downhill Locative: home, here Degree: extremely, somewhat Manner: slowly, delicately Temporal: yesterday, Monday 13

Closed class Prepositions on, under, over, near, by, at, from, to, with Determiners a, an, the Pronouns it, she I Conjunctions and, but, or, as, if, when Auxiliary verbs can, may, should, are Particles up, down, on, off, in, at, by Numerals one, two, second, third 14

Prepositions and particles Prepositions...on top, by then, with him מילות יחס המופיעות לפני שם עצם מצינות יחסי זמן/מקום, אבל לא רק. Particles go on, look up, turn down מופיעים אחרי פועל, ובפעלים טרנזיטיביים, גם אחרי המושא The horse went off its truck/throw off sleep The horse went its track off/throw sleep off* 15

Articles (determiners) a, an, the מופיעים בתחילה צירוף שמני noun phrase גם: this chapter, that page שכיחים מאוד בטקסטים 16

Conjunctions מאחים שני, צירופים, משפטים, וכו. phrases Or, and, but מאחים צירופים מאותו סטטוס Subordinatingמשמשים conjunctions לאיחוי צירופים מקוננים I thought that you might like some milk. I thought main clause That you might - subordinating clause 17

ויש עוד... 18

Tagsets Tagset The set of possible tags for parts of speech. (size is changing in applications, languages...) A tagset should include the information that is needed for the next steps in the process, and that people can annotate well Brown corpus 87 tags Penn Treebank 45 tags Large: 146-tag C7 tagset of used to tag the British National Corpus BNC. Universal - 12 tags 19

Tagsets Tagset The set of possible tags for parts of speech. (size is changing in applications, languages...) A tagset should include the information that is needed for the next steps in the process, and that people can annotate well Brown corpus 87 tags Penn Treebank 45 tags Large: 146-tag C7 tagset of used to tag the British National Corpus BNC. Universal - ~12 tags 20

21 Penn Tagset Noun (person, place or thing) Singular (NN): dog, fork Plural (NNS): dogs, forks Proper (NNP, NNPS): John, Springfields Personal pronoun (PRP): I, you, he, she, it Wh-pronoun (WP): who, what Verb (actions and processes) Base, infinitive (VB): eat Past tense (VBD): ate Gerund (VBG): eating Past participle (VBN): eaten Non 3 rd person singular present tense (VBP): eat 3 rd person singular present tense: (VBZ): eats Modal (MD): should, can To (TO): to (to eat)

Penn Tagset (cont.) Adjective (modify nouns) Basic (JJ): red, tall Comparative (JJR): redder, taller Superlative (JJS): reddest, tallest Adverb (modify verbs) Basic (RB): quickly Comparative (RBR): quicker Superlative (RBS): quickest Preposition (IN): on, in, by, to, with Determiner: Basic (DT) a, an, the WH-determiner (WDT): which, that Coordinating Conjunction (CC): and, but, or, Particle (RP): off (took off), up (put up) 22

Universal tagset Can describe over 22 languages with the same set of tags. Why do we want to do that? Language transfer, ease of use for developers The tags: Noun, Verb, Adv, Adj, Pron, Det, Adp, Num, Conj, Prt, Punc, X 23

Universal tagset Can describe over 22 languages with the same set of tags. Why do we want to do that? Language transfer, ease of use for developers The tags: Noun, Verb, Adv, Adj, Pron, Det, Adp, Num, Conj, Prt, Punc, X preposition or postposition 24

Universal tagset Can describe over 22 languages with the same set of tags. Why do we want to do that? Language transfer, ease of use for developers The tags: Noun, Verb, Adv, Adj, Pron, Det, Adp, Num, Conj, Prt, Punc, X other 25

Part-Of-Speech Tagging תיוג הוא התהליך של השמת חלקי דיבר או סימון לקסיקלי אחר לכל מילה בקורפוס. (tokenization) תיוג מתבצע בדרך כלל גם על סימני פיסוק הקלט הוא רצף מילים ו- tagset מהסוג שראינו. הפלט הוא התיוג הטוב ביותר עבור כל אחת מן המילים. והבעייה המרכזית, היא :ambiguity אשה נעלה נעלה נעלה נעלה את הדלת בפני בעלה 26

Around can be a preposition, particle, or adverb I bought it at the shop around/in the corner. I never got around/rp to getting a car. A new Prius costs around/rb $25K. like can be a verb or a preposition: time flies like an arrow fruit flies like a banana ( flies can be a verb or a noun...) 27

The Distribution of Tags Tags follow all the usual frequency-based distributional behavior. Most word types have only one part of speech. Of the rest, most have two. Things go pretty much as we'd expect from there on. Of course, as usual, the most frequently occurring word types tend to have multiple tags. (As we'll see later in the semester, they also tend to have more meanings). Therefore while its easy to determine the correct tag for most word types, it isn't necessarily so easy to tag most texts. 28

Word Types in the Brown Corpus Unambiguous : 1 tag 35340 Ambiguous: > 1 tag 4100 2 tags 3760 3 tags 264 4 tags 61 5 tags 12 6 tags 2 7 tags 1 ( still ) 29

State of the Art A dumb tagger that simply assigns the most common tag to each word achieves ~90% Best approaches give ~96/97% This still means that there will be on average one tagging error per sentence Life is much more difficult if we do not have a lexicon and/or training corpus or if we use a tagger across domains and genres. 30

מתייגים Rule based Manual rules Transformation-based tagging (~learning) Machine Learned HMM (probabilistic, global) MaxEnt / SVM (discriminative, local) CRF (probabilistic, discriminative, global) Structured Perceptron (discriminative, global) others... 31

Supervised Learning Scheme Labeled Examples Training Algorithm Classification Model New Examples Classification Algorithm Classifications 32

Next Week HMMs 41