Document-level context in deep recurrent neural networks

Similar documents
ECE 6504: Deep Learning for Perception

TEXT MINING TECHNIQUES RORY DUTHIE

Identifying Anaphoric and Non- Anaphoric Noun Phrases to Improve Coreference Resolution

ECE 5424: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning

Deep Neural Networks [GBC] Chap. 6, 7, 8. CS 486/686 University of Waterloo Lecture 18: June 28, 2017

Order-Planning Neural Text Generation from Structured Data

6.041SC Probabilistic Systems Analysis and Applied Probability, Fall 2013 Transcript Lecture 21

Gorgias: Reclams Universal-Bibliothek (German Edition) By Theo Kobusch, Platon

Anaphora Resolution in Hindi Language

Automatic Evaluation for Anaphora Resolution in SUPAR system 1

6.080 / Great Ideas in Theoretical Computer Science Spring 2008

Reply to Cheeseman's \An Inquiry into Computer. This paper covers a fairly wide range of issues, from a basic review of probability theory

StoryTown Reading/Language Arts Grade 2

Instructional Materials Evaluation Review for Alignment in Social Studies Grades K 12

ANGELS 2006 CALENDAR. Page 1

Presupposition Projection and At-issueness

KEEP THIS COPY FOR REPRODUCTION Pý:RPCS.15i )OCUMENTATION PAGE 0 ''.1-AC7..<Z C. in;2re PORT DATE JPOTTYPE AND DATES COVERID

Question Answering. CS486 / 686 University of Waterloo Lecture 23: April 1 st, CS486/686 Slides (c) 2014 P. Poupart 1

Now consider a verb - like is pretty. Does this also stand for something?

QUESTIONING GÖDEL S ONTOLOGICAL PROOF: IS TRUTH POSITIVE?

Arabic and Arab Culture on Israeli Campuses: An Updated Look

Visual Analytics Based Authorship Discrimination Using Gaussian Mixture Models and Self Organising Maps: Application on Quran and Hadith

Pronominal Anaphora in Machine Translation. Jochen Stefan Weiner

Bounded Rationality :: Bounded Models

2.1 Review. 2.2 Inference and justifications

Targeted categories. -Preachers -Sheiks -social figures -members of the local authority -female preachers

The is the best idea/suggestion/film/book/holiday for my. For me, the is because / I like the because / I don t like the because

The Power of Attraction & Manifestation: The Secret.

The BOOK OF ANUBIS Liber 369: The Grimoire of Axis

Anaphora Resolution. Nuno Nobre

NPTEL NPTEL ONINE CERTIFICATION COURSE. Introduction to Machine Learning. Lecture-59 Ensemble Methods- Bagging,Committee Machines and Stacking

"My Father is greater than I." Jesus. (John 14:28)

Lecture 9. A summary of scientific methods Realism and Anti-realism

Functionalism and the Chinese Room. Minds as Programs

Anaphora Resolution in Biomedical Literature: A

Transcription ICANN London IDN Variants Saturday 21 June 2014

Jackson College Introduction to World Religions Philosophy Winter 2016 Syllabus

Todays programme. Background of the TLP. Some problems in TLP. Frege Russell. Saying and showing. Sense and nonsense Logic The limits of language

Yule: A Celebration Of Light And Warmth (Holiday Series) By Dorothy Morrison

6. Truth and Possible Worlds

From Machines To The First Person

Georgia Quality Core Curriculum

Correlates to Ohio State Standards

Information Extraction. CS6200 Information Retrieval (and a sort of advertisement for NLP in the spring)

LIVING LIFE ON PURPOSE

Timely help. Unit 3. The effects of earthquakes. Read the following article and answer the questions. Vocabulary

Here s a very dumbed down way to understand why Gödel is no threat at all to A.I..

Russell s Problems of Philosophy

My Grandpa Plants the Rainforest

Ministry Plan. Trinity Core Mission

Anaphora Resolution in Biomedical Literature: A Hybrid Approach

Thinking Socratically

A Machine Learning Approach to Resolve Event Anaphora

The statistics used in this report have been compiled before the completion of any Post Results Services.

Prioritizing Issues in Islamic Economics and Finance

JESUS: SEARCH FOR PROVEN HISTORY

The Chartres Labyrinth

ADAIR COUNTY SCHOOL DISTRICT GRADE 03 REPORT CARD Page 1 of 5

Circularity in ethotic structures

Verification and Validation

15 Does God have a Nature?

Four Proposals for German Clause Structure

Solutions for Assignment 1

Digital Logic Lecture 5 Boolean Algebra and Logic Gates Part I

NPTEL NPTEL ONLINE COURSES REINFORCEMENT LEARNING. UCB1 Explanation (UCB1)

Reference Resolution. Regina Barzilay. February 23, 2004

THE PHYSICAL EVIDENCE

Reference Resolution. Announcements. Last Time. 3/3 first part of the projects Example topics

INTRODUCTION TO LOGIC 1 Sets, Relations, and Arguments

08 Anaphora resolution

Performance Analysis with Vampir

in terms of us being generally more health-conscious than average, but because we support freedom of lifestyle as well as freedom of religious

StoryTown Reading/Language Arts Grade 3

McDougal Littell Literature Grade 7. Missouri Communication Arts Grade-Level Expectations and Depth of Knowledge Levels Grade 7

10/16/ st Century Faith Formation for All Ages & Generations! 21 ST CENTURY LEARNING & FAITH FORMATION. John Roberto, LifelongFaith Associates

1 Logical dynamics, agency, and intelligent interaction

Who wrote the Letter to the Hebrews? Data mining for detection of text authorship

SESSION 1 WORK. If the person sitting next to you were a fly on the wall at your workplace would they be surprised by what they observed of you?

The ontology of human rights and obligations

Improving Tree-to-Tree Translation with Packed Forests

Corporate Team Training Session # 2 May 30 / June 1

Strand 1: Reading Process

The Self and Other Minds

AliQAn, Spanish QA System at multilingual

ANAPHORA RESOLUTION IN MACHINE TRANSLATION

Materie und Geist. Eine philosophische Untersuchung. Arno Ros. Paderborn, Germany: Mentis 2005, 686 pages, 84, paperback

Behavior and Other Minds: A Response to Functionalists

DOWNLOAD OR READ : THE LOGIC BOOK PDF EBOOK EPUB MOBI

Presupposition: An (un)common attitude?

Revelation 9:15 and the Limits of Greek Syntax

subject are complex and somewhat conflicting. For details see Wang (1993).

Examining the nature of mind. Michael Daniels. A review of Understanding Consciousness by Max Velmans (Routledge, 2000).

1 ReplytoMcGinnLong 21 December 2010 Language and Society: Reply to McGinn. In his review of my book, Making the Social World: The Structure of Human

EXAM PREP (Semester 2: 2018) Jules Khomo. Linguistic analysis is concerned with the following question:

More Than Just ME! April 29, 2018 Rev. Rich Thewlis

Who Moved My Cheese Large Print Edition

Jesus Cleansed the Temple

Artificial Intelligence Prof. Deepak Khemani Department of Computer Science and Engineering Indian Institute of Technology, Madras

When it showed up on January 4, 2012 the mysterious image contained a simple message in white text on a black background:

Transcription:

Institute of Computational Linguistics Document-level context in deep recurrent neural networks Kolloquium Talk 2017 Mathias Müller 10/30/17 KOLLO, Mathias Müller

On the menu today Establish that document-level context matters for neural machine translation (NMT) How to evaluate document-level improvements Proposed architecture to integrate arbitrary contexts (multicontext conditional GRU) Also on the menu today: (http://www.mensa.uzh.ch/de/menueplaene/mensa-uzh-binzmuehle/dienstag.html) 10/30/17 KOLLO, Mathias Müller Page 2

Institute of Computational Linguistics Establishing that document-level context matters in NMT

Context matters (rather serious illustration) 10/30/17 KOLLO, Mathias Müller Page 4

Context matters (fabricated example) Source The sun is shining. It is bright. Target Die Sonne scheint. ist hell. 10/30/17 KOLLO, Mathias Müller Page 5

Context matters (actual WMT examples) Source This organism has dual capability. It can grow with either phosphorous or arsenic. Target Dieser Organismus hat zwei Möglichkeiten. Er benötigt zum Wachsen entweder Phosphor oder Arsen. (example taken from newstest2011.{de,en}) 10/30/17 KOLLO, Mathias Müller Page 6

Context matters (actual WMT examples) Sentence-level NMT solves the following task: Source This organism has dual capability. It can grow with either phosphorous or arsenic. Target Dieser Organismus hat zwei Möglichkeiten. benötigt zum Wachsen entweder Phosphor oder Arsen. 10/30/17 KOLLO, Mathias Müller Page 7

Context matters (actual WMT examples) Source However, the European Central Bank (ECB) took an interest in it in a report on virtual currencies published in October. It describes bitcoin as "the most successful virtual currency, [ ]. Target Dennoch hat die Europäische Zentralbank (EZB) in einem im Oktober veröffentlichten Bericht über virtuelle Währungen Interesse hierfür gezeigt. Sie beschreibt Bitcoin als "die virtuelle Währung mit dem größten Erfolg [ ]. (example taken from newstest2013.{de,en}) 10/30/17 KOLLO, Mathias Müller Page 8

Context matters (actual WMT examples) Source However, the European Central Bank (ECB) took an interest in it in a report on virtual currencies published in October. It describes bitcoin as "the most successful virtual currency, [ ]. Target Dennoch hat die Europäische Zentralbank (EZB) in einem im Oktober veröffentlichten Bericht über virtuelle Währungen Interesse hierfür gezeigt. beschreibt Bitcoin als "die virtuelle Währung mit dem größten Erfolg [ ]. 10/30/17 KOLLO, Mathias Müller Page 9

Context matters (actual WMT examples) 10/30/17 KOLLO, Mathias Müller Page 10

Do we treat NMT models fairly? Source It describes bitcoin as "the most successful virtual currency. Target Es beschreibt den Bitcoin als "die erfolgreichste virtuelle Währung". 10/30/17 KOLLO, Mathias Müller Page 11

Institute of Computational Linguistics Establishing that document-level context matters in NMT How to evaluate document-level improvements

How to evaluate automatically? Metrics like BLEU too coarse-grained Also, impossible to focus evaluation on specific linguistic phenomena Solutions: Use specialized metrics (Miculicich Werlen and Popescu-Belis, 2017) Design challenge sets, for contrastive evaluation 10/30/17 KOLLO, Mathias Müller Page 13

Challenge set evaluation Idea: take advantage of the fact that NMT systems are conditional language models Contrastive evaluation by model scoring: Source Despite the fact that it is a part of China, Hong Kong determines its currency policy separately. Target Hongkong bestimmt, obwohl es zu China gehört, seine Währungspolitik selbst. Contrastive Hongkong bestimmt, obwohl er zu China gehört, seine Währungspolitik selbst. (example taken from newstest2009) 10/30/17 KOLLO, Mathias Müller Page 14

Challenge set evaluation Previous experience with challenge sets: hand-selected, manually annotated examples to test pronoun translation (Guillou and Hardmeier, 2016) first application to NMT: LingEval97 (Sennrich, 2017) extension to words with several senses: ContraWSD (Rios et al., 2017) And, very recently: name challenge set due to Isabelle et al. (2017) handcrafted set with ambiguous pronouns: Bawden (in preparation) 10/30/17 KOLLO, Mathias Müller Page 15

Contra Pronoun Challenge Set 10/30/17 KOLLO, Mathias Müller Page 16

Institute of Computational Linguistics Establishing that document-level context matters in NMT How to evaluate document-level improvements Proposed architecture to integrate arbitrary contexts (multi-context conditional GRU)

Integrating document-level context Into existing architectures: Nematus, an extremely successful tool (Sennrich et al., 2017) encoder-decoder network with soft attention (Bahdanau et al., 2014) encoder and decoder are recurrent neural networks (RNNs) Rule out simple solutions: concatenate sentences problematic because of sequence length (Koehn and Knowles, 2017) 10/30/17 KOLLO, Mathias Müller Page 18

What are other groups doing? Known NMT solutions that have intersentential context: gated auxiliary context or warm start decoder initialization with a document summary (Wang et al., 2017) additional encoder and attention network for previous source sentence (Jean et al., 2017) Concatenate previous source sentence, mark with a prefix (Tiedemann and Scherrer, 2017) both source and target context (Miculicich Werlen et al., under review) 10/30/17 KOLLO, Mathias Müller Page 19

Actual Implementation Building on previous work, extension of conditional gated recurrent unit (cgru) RNN that Nematus uses as decoder allow for arbitrary past (obviously) context sizes, both source and target side 1 additional encoder for each context, 1 additional GRU unit with attention during deep transition 10/30/17 KOLLO, Mathias Müller Page 20

Recurrent neural networks refresher 10/30/17 KOLLO, Mathias Müller Page 21

RNN variant: gated recurrent unit (GRU) Figure taken from Chung et al. (2014) 10/30/17 KOLLO, Mathias Müller Page 22

Notion of depth in RNN networks generally three types of depth (Pascanu et al., 2013): stacked layers deep transition deep output (each layer individually recurrent) (units not individually recurrent) (units not individually recurrent) in Nematus, the decoder is implemented as a cgru with deep transition and deep output crucially: attention over source sentence vectors C is a deep transition step 10/30/17 KOLLO, Mathias Müller Page 23

Conditional gated recurrent unit (cgru) Detailed formulas: https://github.com/nyu-dl/dl4mt-tutorial/blob/master/docs/cgru.pdf 10/30/17 KOLLO, Mathias Müller Page 24

Extension of cgru for n contexts Detailed formulas: https://github.com/bricksdont/ncgru/blob/master/ct.pdf 10/30/17 KOLLO, Mathias Müller Page 25

Outlook Experiments with this new architecture until end of the year: small source context vs. equally deep baseline target context seems to be useful (Bawden, in preparation) challenge set evaluation, focus on pronouns use attention as an inspection tool (Kuncoro et al., 2016; Rikters et al., 2017) Then, look for more general solution, maybe outside of Nematus investigate other kinds of networks: fully convolutional (Gehring et al., 2017) or self-attention ( transformer ) models (Vaswani et al., 2017), both with positional embeddings 10/30/17 KOLLO, Mathias Müller Page 26

Thanks! Code currently here: https://gitlab.cl.uzh.ch/mt/nematus-context2 10/30/17 KOLLO, Mathias Müller Page 27