Deep Neural Networks [GBC] Chap. 6, 7, 8. CS 486/686 University of Waterloo Lecture 18: June 28, 2017

Similar documents
ECE 5424: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning

ECE 6504: Deep Learning for Perception

NPTEL NPTEL ONINE CERTIFICATION COURSE. Introduction to Machine Learning. Lecture-59 Ensemble Methods- Bagging,Committee Machines and Stacking

ECE 5424: Introduction to Machine Learning

Allreduce for Parallel Learning. John Langford, Microsoft Resarch, NYC

ECE 5424: Introduction to Machine Learning

Coreference Resolution Lecture 15: October 30, Reference Resolution

Order-Planning Neural Text Generation from Structured Data

Introduction to Statistical Hypothesis Testing Prof. Arun K Tangirala Department of Chemical Engineering Indian Institute of Technology, Madras

CS485/685 Lecture 5: Jan 19, 2016

Gesture recognition with Kinect. Joakim Larsson

6.041SC Probabilistic Systems Analysis and Applied Probability, Fall 2013 Transcript Lecture 21

NPTEL NPTEL ONLINE CERTIFICATION COURSE. Introduction to Machine Learning. Lecture 31

Six Sigma Prof. Dr. T. P. Bagchi Department of Management Indian Institute of Technology, Kharagpur. Lecture No. # 18 Acceptance Sampling

CS 4803 / 7643: Deep Learning

Radiomics for Disease Characterization: An Outcome Prediction in Cancer Patients

occasions (2) occasions (5.5) occasions (10) occasions (15.5) occasions (22) occasions (28)

Question Answering. CS486 / 686 University of Waterloo Lecture 23: April 1 st, CS486/686 Slides (c) 2014 P. Poupart 1

Lesson 07 Notes. Machine Learning. Quiz: Computational Learning Theory

Module 02 Lecture - 10 Inferential Statistics Single Sample Tests

Sorting: Merge Sort. College of Computing & Information Technology King Abdulaziz University. CPCS-204 Data Structures I

Discussion Notes for Bayesian Reasoning

From Machines To The First Person

Agnostic KWIK learning and efficient approximate reinforcement learning

Six Sigma Prof. Dr. T. P. Bagchi Department of Management Indian Institute of Technology, Kharagpur

Computational Learning Theory: Agnostic Learning

Recursive Mergesort. CSE 589 Applied Algorithms Spring Merging Pattern of Recursive Mergesort. Mergesort Call Tree. Reorder the Merging Steps

Artificial Intelligence Prof. Deepak Khemani Department of Computer Science and Engineering Indian Institute of Technology, Madras

Surveying Prof. Bharat Lohani Department of Civil Engineering Indian Institute of Technology, Kanpur. Module - 7 Lecture - 3 Levelling and Contouring

Artificial Intelligence Prof. Deepak Khemani Department of Computer Science and Engineering Indian Institute of Technology, Madras

Probability Foundations for Electrical Engineers Prof. Krishna Jagannathan Department of Electrical Engineering Indian Institute of Technology, Madras

Quorums. Christian Plattner, Gustavo Alonso Exercises for Verteilte Systeme WS05/06 Swiss Federal Institute of Technology (ETH), Zürich

Content Area Variations of Academic Language

Using Machine Learning Algorithms for Categorizing Quranic Chapters by Major Phases of Prophet Mohammad s Messengership

Module - 02 Lecturer - 09 Inferential Statistics - Motivation

Effective Evangelisation

MITOCW watch?v=4hrhg4euimo

Basic Algorithms Overview

A Discussion on Kaplan s and Frege s Theories of Demonstratives

Grade 7 Math Connects Suggested Course Outline for Schooling at Home 132 lessons

Factors related to students focus on God

U.S. Catholics Express Favorable View of Pope Francis

Sociology Exam 1 Answer Key February 18, 2011

9/7/2017. CS535 Big Data Fall 2017 Colorado State University Week 3 - B. FAQs. This material is built based on

Smith Waterman Algorithm - Performance Analysis

Why Computers are not Intelligent: An Argument. Richard Oxenberg

Agnostic Learning with Ensembles of Classifiers

POLS 205 Political Science as a Social Science. Making Inferences from Samples

Communion with God Graduate

PRESS RELEASE. REACH OUT AND TOUCH THE SOURCE OF ALL LIFE? New Experiential Cinema Genre Allows Viewers to Experience the Divine

Factors related to students spiritual orientations

ABC News' Guide to Polls & Public Opinion

MITOCW watch?v=k2sc-wpdt6k

MITOCW watch?v=6pxncdxixne

PHILOSOPHY AND RELIGIOUS STUDIES

Artificial Intelligence: Valid Arguments and Proof Systems. Prof. Deepak Khemani. Department of Computer Science and Engineering

Building age models is hard 12/12/17. Ar#ficial Intelligence. An artificial intelligence tool for complex age-depth models

Biometrics Prof. Phalguni Gupta Department of Computer Science and Engineering Indian Institute of Technology, Kanpur. Lecture No.

Performance Analysis with Vampir

Visual Analytics Based Authorship Discrimination Using Gaussian Mixture Models and Self Organising Maps: Application on Quran and Hadith

Lesson 10 Notes. Machine Learning. Intro. Joint Distribution

Follow Me. R w Works. Participant Workbook. An Episcopal Planning Tool to Help Your Church Advance Spiritual Growth. Episcopal Diocese of Chicago

=EQUALS= Center for. A Club of Investigation and Discovery. Published by: autosocratic PRESS Copyright 2011 Michael Lee Round

Document-level context in deep recurrent neural networks

Scientific Realism and Empiricism

RECOMMENDED CITATION: Pew Research Center, July, 2014, How Americans Feel About Religious Groups

Appendix 1. Towers Watson Report. UMC Call to Action Vital Congregations Research Project Findings Report for Steering Team

Lesson 09 Notes. Machine Learning. Intro

Information Extraction. CS6200 Information Retrieval (and a sort of advertisement for NLP in the spring)

Laboratory Exercise Saratoga Springs Temple Site Locator

POSTSCRIPT A PREAMBLE

Anaphora Resolution in Biomedical Literature: A

A Scientific Model Explains Spirituality and Nonduality

MORAL PARTICULARISM AND TRANSDUCTION. Gilbert Harman Princeton University

Sounds of Love. Intuition and Reason

Math 10 Lesson 1 4 Answers

Darwinian Morality. Why aren t t all the atheists raping and pillaging? Ron Garret (Erann( Gat) September 2004

Steady and Transient State Analysis of Gate Leakage Current in Nanoscale CMOS Logic Gates

McDougal Littell High School Math Program. correlated to. Oregon Mathematics Grade-Level Standards

The Decline of the Traditional Church Choir: The Impact on the Church and Society. Dr Arthur Saunders

1.2. What is said: propositions

MITOCW watch?v=ogo1gpxsuzu

THE EARTH IS CALLING ARE YOU LISTENING? By Chief Geronimo With a Message from The Pleiadian Emissaries of Light

CS224W Project Proposal: Characterizing and Predicting Dogmatic Networks

The World Wide Web and the U.S. Political News Market: Online Appendices

Tools Andrew Black CS 305 1

(Refer Slide Time 03:00)

Houghton Mifflin MATHEMATICS

Protestant Pastors Views on Creation. Survey of 1,000 Protestant Pastors

Flexible Destiny: Creating our Future

The Evolution of Cognitive and Noncognitive Skills Over the Life Cycle of the Child

CHAPTER 17: UNCERTAINTY AND RANDOM: WHEN IS CONCLUSION JUSTIFIED?

Technique. Work. Sale

There are two common forms of deductively valid conditional argument: modus ponens and modus tollens.

Overview of the ATLAS Fast Tracker (FTK) (daughter of the very successful CDF SVT) July 24, 2008 M. Shochet 1

Limited Intervention

Introduction Chapter 1 of Social Statistics

Conditional Probability, Hypothesis Testing, and the Monty Hall Problem

175 Chapter CHAPTER 23: Probability

Transcription:

Deep Neural Networks [GBC] Chap. 6, 7, 8 CS 486/686 University of Waterloo Lecture 18: June 28, 2017

Outline Deep Neural Networks Gradient Vanishing Rectified linear units Overfitting Dropout Breakthroughs Acoustic modeling in speech recognition Image recognition CS486/686 Lecture Slides (c) 2017 P. Poupart 2

Deep Neural Network Definition: neural network with many hidden layers Advantage: high expressivity Challenges: How should we train a deep neural network? How can we avoid overfitting? CS486/686 Lecture Slides (c) 2017 P. Poupart 3

Expressivity Neural networks with one hidden layer of sigmoid/hyperbolic units can approximate arbitrarily closely neural networks with several layers of sigmoid/hyperbolic units However as we increase the number of layers, the number of units needed may decrease exponentially (with the number of layers) CS486/686 Lecture Slides (c) 2017 P. Poupart 4

Example Parity Function Single layer of hidden nodes inputs CS486/686 Lecture Slides (c) 2017 P. Poupart 5

Example Parity Function layers of hidden nodes 2 odd subsets 2 odd subsets 2 odd subsets CS486/686 Lecture Slides (c) 2017 P. Poupart 6

The power of depth (practice) Challenge: how to train deep NNs? CS486/686 Lecture Slides (c) 2017 P. Poupart 7

Speech 2006 (Hinton): first effective alg. for deep NN layerwise training of Stacked Restricted Boltzmann Machines (SRBM)s 2009: Breakthrough in acoustic modeling replace Gaussian Mixture Models by SRBMs Improved speech recognition at Google, Microsoft, IBM 2013-today: recurrent neural nets (LSTM) Google error rate: 23% (2013) 8% (2015) Microsoft error rate: 5.9% (Oct 17, 2016) same as human performance CS486/686 Lecture Slides (c) 2017 P. Poupart 8

Image Classification ImageNet Large Scale Visual Recognition Challenge Features + SVMs Deep Convolutional Neural Nets Classification error (%) 30 25 20 15 10 5 0 28.2 25.8 5 8 19 22 152 depth 16.4 11.7 7.3 6.7 3.57 5.1 3.07 CS486/686 Lecture Slides (c) 2017 P. Poupart 9

Vanishing Gradients Deep neural networks of sigmoid and hyperbolic units often suffer from vanishing gradients small gradient medium gradient large gradient CS486/686 Lecture Slides (c) 2017 P. Poupart 10

Sigmoid and hyperbolic units Derivative is always less than 1 sigmoid hyperbolic CS486/686 Lecture Slides (c) 2017 P. Poupart 11

Simple Example Common weight initialization in (-1,1) Sigmoid function and its derivative always less than 1 This leads to vanishing gradients: CS486/686 Lecture Slides (c) 2017 P. Poupart 12

Avoiding Vanishing Gradients Two popular solutions: Pre-training Rectified linear units and maxout units CS486/686 Lecture Slides (c) 2017 P. Poupart 13

Rectified Linear Units (ReLU) Rectified linear: Gradient is 0 or 1 Sparse computation Soft version: Softplus Softplus Warning: softplus does not prevent gradient vanishing (gradient < 1) Rectified Linear CS486/686 Lecture Slides (c) 2017 P. Poupart 14

Maxout Units Generalization of rectified linear units max identity identity identity CS486/686 Lecture Slides (c) 2017 P. Poupart 15

Overfitting High expressivity increases the risk of overfitting # of parameters is often larger than the amount of data Solution: Regularization Dropout Data augmentation CS486/686 Lecture Slides (c) 2017 P. Poupart 16

Dropout Idea: randomly drop some units from the network when training Training: at each iteration of gradient descent Each hidden unit is dropped with prob. 0.5 Each input unit is dropped with prob. 0.2 Prediction (testing): Multiply the output of each unit by one minus its drop probability CS486/686 Lecture Slides (c) 2017 P. Poupart 17

Intuition Dropout can be viewed as an approximate form of ensemble learning In each training iteration, a different subnetwork is trained At test time, these subnetworks are merged by averaging their weights CS486/686 Lecture Slides (c) 2017 P. Poupart 18

Robustness In sexual reproduction, half of the genes of two individuals are dropped and the remaining genes are merged to produce a new individual Genes are forced to evolve independently so that most combinations yield functional individuals Similarly, units in a neural net are forced to capture features that are largely independent of other units CS486/686 Lecture Slides (c) 2017 P. Poupart 19

Applications of Deep Neural Networks Speech Recognition Image recognition Machine translation Control Any application of shallow neural networks CS486/686 Lecture Slides (c) 2017 P. Poupart 20

Acoustic Modeling in Speech Recognition CS486/686 Lecture Slides (c) 2017 P. Poupart 21

Acoustic Modeling in Speech Recognition CS486/686 Lecture Slides (c) 2017 P. Poupart 22

Image Recognition Convolutional Neural Network With rectified linear units and dropout Data augmentation for transformation invariance CS486/686 Lecture Slides (c) 2017 P. Poupart 23

ImageNet Breakthrough Results: ILSVRC-2012 From Krizhevsky, Sutskever, Hinton CS486/686 Lecture Slides (c) 2017 P. Poupart 24

ImageNet Breakthrough From Krizhevsky, Sutskever, Hinton CS486/686 Lecture Slides (c) 2017 P. Poupart 25