ECE 5424: Introduction to Machine Learning

Similar documents
ECE 5984: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning

ECE 6504: Deep Learning for Perception

Deep Neural Networks [GBC] Chap. 6, 7, 8. CS 486/686 University of Waterloo Lecture 18: June 28, 2017

NPTEL NPTEL ONINE CERTIFICATION COURSE. Introduction to Machine Learning. Lecture-59 Ensemble Methods- Bagging,Committee Machines and Stacking

NPTEL NPTEL ONLINE CERTIFICATION COURSE. Introduction to Machine Learning. Lecture 31

ECE 5424: Introduction to Machine Learning

CS 4803 / 7643: Deep Learning

Agnostic Learning with Ensembles of Classifiers

Using Machine Learning Algorithms for Categorizing Quranic Chapters by Major Phases of Prophet Mohammad s Messengership

Curriculum Guide for Pre-Algebra

On 21 September 2014, Alexej Chervonenkis went for a walk in a park on the outskirts of Moscow and got lost. He called his wife in the evening, and

6.041SC Probabilistic Systems Analysis and Applied Probability, Fall 2013 Transcript Lecture 21

Closing Remarks: What can we do with multiple diverse solutions?

Lesson 07 Notes. Machine Learning. Quiz: Computational Learning Theory

Artificial Intelligence Prof. Deepak Khemani Department of Computer Science and Engineering Indian Institute of Technology, Madras

Radiomics for Disease Characterization: An Outcome Prediction in Cancer Patients

AUTHORSHIP DISCRIMINATION ON QURAN AND HADITH USING DISCRIMINATIVE LEAVE-ONE-OUT CLASSIFICATION

Allreduce for Parallel Learning. John Langford, Microsoft Resarch, NYC

MORAL PARTICULARISM AND TRANSDUCTION. Gilbert Harman Princeton University

Grade 6 correlated to Illinois Learning Standards for Mathematics

Question Answering. CS486 / 686 University of Waterloo Lecture 23: April 1 st, CS486/686 Slides (c) 2014 P. Poupart 1

Artificial Intelligence Prof. Deepak Khemani Department of Computer Science and Engineering Indian Institute of Technology, Madras

McDougal Littell High School Math Program. correlated to. Oregon Mathematics Grade-Level Standards

Gödel's incompleteness theorems

From Machines To The First Person

QUESTION ANSWERING SYSTEM USING SIMILARITY AND CLASSIFICATION TECHNIQUES

Computational Learning Theory: Agnostic Learning

Coreference Resolution Lecture 15: October 30, Reference Resolution

What Is On The Final. Review. What Is Not On The Final. What Might Be On The Final

This is a relatively new term used by those in things like Speech recognition software development or robotic engineering or the internet searches.

MITOCW watch?v=4hrhg4euimo

Introduction to Statistical Hypothesis Testing Prof. Arun K Tangirala Department of Chemical Engineering Indian Institute of Technology, Madras

Laboratory Exercise Saratoga Springs Temple Site Locator

Building age models is hard 12/12/17. Ar#ficial Intelligence. An artificial intelligence tool for complex age-depth models

Document-level context in deep recurrent neural networks

Information Science and Statistics. Series Editors: M. Jordan J. Kleinberg B. Schölkopf

Topological Distance Between Nonplanar Transportation Networks

Agnostic KWIK learning and efficient approximate reinforcement learning

A Cover Page. Classification of Jewish Law Articles According to the Ethnic Group of their Writers Using Stems

MITOCW watch?v=k2sc-wpdt6k

Outline. Uninformed Search. Problem-solving by searching. Requirements for searching. Problem-solving by searching Uninformed search techniques

Information Retrieval LIS 544 IMT 542 INSC 544

How many imputations do you need? A two stage calculation using a quadratic rule

A Scientific Model Explains Spirituality and Nonduality

Math Matters: Why Do I Need To Know This? 1 Logic Understanding the English language

Quorums. Christian Plattner, Gustavo Alonso Exercises for Verteilte Systeme WS05/06 Swiss Federal Institute of Technology (ETH), Zürich

Supplement to: Aksoy, Ozan Motherhood, Sex of the Offspring, and Religious Signaling. Sociological Science 4:

Artificial Intelligence: Valid Arguments and Proof Systems. Prof. Deepak Khemani. Department of Computer Science and Engineering

ITU Kaleidoscope 2016 ICTs for a Sustainable World

NPTEL NPTEL ONLINE COURSES REINFORCEMENT LEARNING. UCB1 Explanation (UCB1)

Lesson 10 Notes. Machine Learning. Intro. Joint Distribution

Six Sigma Prof. Dr. T. P. Bagchi Department of Management Indian Institute of Technology, Kharagpur

Lesson 09 Notes. Machine Learning. Intro

Privacy: more than meets the eye. Daniel Kifer (Penn State University)

Gesture recognition with Kinect. Joakim Larsson

Boosting. D. Blei Interacting with Data 1 / 15

MISSOURI S FRAMEWORK FOR CURRICULAR DEVELOPMENT IN MATH TOPIC I: PROBLEM SOLVING

Introduction Symbolic Logic

The Self and Other Minds

Functionalism and the Chinese Room. Minds as Programs

Ms. Shruti Aggarwal Assistant Professor S.G.G.S.W.U. Fatehgarh Sahib

The Impact of Oath Writing Style on Stylometric Features and Machine Learning Classifiers

1/17/2018 ECE 313. Probability with Engineering Applications Section B Y. Lu. ECE 313 is quite a bit different from your other engineering courses.

Can machines think? Machines, who think. Are we machines? If so, then machines can think too. We compute since 1651.

Mathematics as we know it has been created and used by

Visual Analytics Based Authorship Discrimination Using Gaussian Mixture Models and Self Organising Maps: Application on Quran and Hadith

The Evolution of Cognitive and Noncognitive Skills Over the Life Cycle of the Child

PHIL 155: The Scientific Method, Part 1: Naïve Inductivism. January 14, 2013

Identity and Curriculum in Catholic Education

TEXT MINING TECHNIQUES RORY DUTHIE

AMERICAN SECULARISM CULTUR AL CONTOURS OF NONRELIGIOUS BELIEF SYSTEMS. Joseph O. Baker & Buster G. Smith

Carolina Bachenheimer-Schaefer, Thorsten Reibel, Jürgen Schilder & Ilija Zivadinovic Global Application and Solution Team

6.080 / Great Ideas in Theoretical Computer Science Spring 2008

logic is everywhere Logik ist überall Hikmat har Jaga Hai Mantık her yerde la logica è dappertutto lógica está em toda parte

CS224W Project Proposal: Characterizing and Predicting Dogmatic Networks

Syllogism. Exam Importance Exam Importance. CAT Very Important IBPS/Bank PO Very Important. XAT Very Important BANK Clerk Very Important

Sorting: Merge Sort. College of Computing & Information Technology King Abdulaziz University. CPCS-204 Data Structures I

MITOCW watch?v=ogo1gpxsuzu

On the hard problem of consciousness: Why is physics not enough?

That's Your Evidence?: Using Mechanical Turk To Develop A Computational Account Of Debate And Argumentation In Online Forums

MITOCW watch?v=iozvbilaizc

About QF101 Overview Careers for Quants Pre-U Math Takeaways. Introduction. Christopher Ting.

APRIL 2017 KNX DALI-Gateways DG/S x BU EPBP GPG Building Automation. Thorsten Reibel, Training & Qualification

It has been stated that stories told by an oral culture have

Introduction to Quantitative Finance

Let the Light of Christ Shine

Stupid Personal Growth Report - Mid year 2017

Smith Waterman Algorithm - Performance Analysis

Quorums Quicken Queries: Efficient Asynchronous Secure Multiparty Computation

Information Extraction. CS6200 Information Retrieval (and a sort of advertisement for NLP in the spring)

Belief Ownership without Authorship: Agent Reliabilism s Unlucky Gambit against Reflective Luck Benjamin Bayer September 1 st, 2014

Multiple realizability and functionalism

Possibility and Necessity

Deconstructing Data Science

INF5020 Philosophy of Information: Ontology

Who wrote the Letter to the Hebrews? Data mining for detection of text authorship

RenewalWorks Program Findings/Recommendations

Carl de Boor: On Wings of Splines >>>

Transcription:

ECE 5424: Introduction to Machine Learning Topics: SVM Multi-class SVMs Neural Networks Multi-layer Perceptron Readings: Barber 17.5, Murphy 16.5 Stefan Lee Virginia Tech

HW2 Graded Mean 63/61 = 103% Max: 76 Min: 20 (C) Dhruv Batra 2

Administrativia HW3 Due: Nov 7 th 11:55PM You will implement primal & dual SVMs Kaggle competition: Higgs Boson Signal vs Background classification (C) Dhruv Batra 3

Administrativia (C) Dhruv Batra 4

Recap of Last Time (C) Dhruv Batra 5

Linear classifiers Which line is better? w.x = j w (j) x (j) 6

Dual SVM derivation (1) the linearly separable case (C) Dhruv Batra Slide Credit: Carlos Guestrin 7

Dual SVM derivation (1) the linearly separable case (C) Dhruv Batra Slide Credit: Carlos Guestrin 8

Dual SVM formulation the linearly separable case (C) Dhruv Batra Slide Credit: Carlos Guestrin 9

Dual SVM formulation the non-separable case (C) Dhruv Batra Slide Credit: Carlos Guestrin 10

Dual SVM formulation the non-separable case (C) Dhruv Batra Slide Credit: Carlos Guestrin 11

Why did we learn about the dual SVM? Builds character! Exposes structure about the problem There are some quadratic programming algorithms that can solve the dual faster than the primal The kernel trick!!! (C) Dhruv Batra Slide Credit: Carlos Guestrin 12

Dual SVM interpretation: Sparsity (C) Dhruv Batra Slide Credit: Carlos Guestrin 13

Dual formulation only depends on dot-products, not on w! (C) Dhruv Batra 14

Polynomials of degree d Common kernels Polynomials of degree up to d Gaussian kernel / Radial Basis Function 2 Sigmoid (C) Dhruv Batra Slide Credit: Carlos Guestrin 15

Plan for Today SVMs Multi-class Neural Networks (C) Dhruv Batra 16

What about multiple classes? (C) Dhruv Batra Slide Credit: Carlos Guestrin 17

One against All (Rest) y2 Not y2 Learn N classifiers: y1 Not y1 Noty3 y3 (C) Dhruv Batra Slide Credit: Carlos Guestrin 18

One against One y2 y1 y1 Learn N-choose-2 classifiers: y3 y2 y3 (C) Dhruv Batra Slide Credit: Carlos Guestrin 19

Problems C 1 C 3? R 1 R 1 R 2? R 3 C 1 C 2 C 1 R 3 C 2 R 2 Not C 1 Not C 2 C 2 C 3 (C) Dhruv Batra Image Credit: Kevin Murphy 20

Learn 1 classifier: Multiclass SVM Simultaneously learn 3 sets of weights (C) Dhruv Batra Slide Credit: Carlos Guestrin 21

Learn 1 classifier: Multiclass SVM (C) Dhruv Batra Slide Credit: Carlos Guestrin 22

Addressing non-linearly separable data Option 1, non-linear features Choose non-linear features, e.g., Typical linear features: w 0 + i w i x i Example of non-linear features: Degree 2 polynomials, w 0 + i w i x i + ij w ij x i x j Classifier h w (x) still linear in parameters w As easy to learn Data is linearly separable in higher dimensional spaces Express via kernels (C) Dhruv Batra Slide Credit: Carlos Guestrin 23

Addressing non-linearly separable data Option 2, non-linear classifier Choose a classifier h w (x) that is non-linear in parameters w, e.g., Decision trees, neural networks, More general than linear classifiers But, can often be harder to learn (non-convex optimization required) Often very useful (outperforms linear classifiers) In a way, both ideas are related (C) Dhruv Batra Slide Credit: Carlos Guestrin 24

New Topic: Neural Networks (C) Dhruv Batra 25

Synonyms Neural Networks Artificial Neural Network (ANN) Feed-forward Networks Multilayer Perceptrons (MLP) Types of ANN Convolutional Nets Autoencoders Recurrent Neural Nets [Back with a new name]: Deep Nets / Deep Learning (C) Dhruv Batra 26

Biological Neuron (C) Dhruv Batra 27

Artificial Neuron Perceptron (with step function) Logistic Regression (with sigmoid) (C) Dhruv Batra 28

Sigmoid w 0 =2, w 1 =1 w 0 =0, w 1 =1 w 0 =0, w 1 =0.5 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0-6 -4-2 0 2 4 6 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0-6 -4-2 0 2 4 6 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0-6 -4-2 0 2 4 6 (C) Dhruv Batra Slide Credit: Carlos Guestrin 29

Many possible response functions Linear Sigmoid Exponential Gaussian

Limitation A single neuron is still a linear decision boundary What to do? (C) Dhruv Batra 31

(C) Dhruv Batra 32

Limitation A single neuron is still a linear decision boundary What to do? Idea: Stack a bunch of them together! (C) Dhruv Batra 33

Hidden layer 1-hidden layer feed-forward network: On board (C) Dhruv Batra 34

Neural Nets Best performers on OCR http://yann.lecun.com/exdb/lenet/index.html NetTalk Text to Speech system from 1987 http://youtu.be/txmafho6diy?t=45m15s Rick Rashid speaks Mandarin http://youtu.be/nu-nlqqfckg?t=7m30s (C) Dhruv Batra 35

Universal Function Approximators Theorem 3-layer network with linear outputs can uniformly approximate any continuous function to arbitrary accuracy, given enough hidden units [Funahashi 89] (C) Dhruv Batra 36

Neural Networks Demo http://playground.tensorflow.org/ (C) Dhruv Batra 37