ECE 5984: Introduction to Machine Learning

Similar documents
ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning

ECE 6504: Deep Learning for Perception

Deep Neural Networks [GBC] Chap. 6, 7, 8. CS 486/686 University of Waterloo Lecture 18: June 28, 2017

NPTEL NPTEL ONINE CERTIFICATION COURSE. Introduction to Machine Learning. Lecture-59 Ensemble Methods- Bagging,Committee Machines and Stacking

ECE 5424: Introduction to Machine Learning

NPTEL NPTEL ONLINE CERTIFICATION COURSE. Introduction to Machine Learning. Lecture 31

Using Machine Learning Algorithms for Categorizing Quranic Chapters by Major Phases of Prophet Mohammad s Messengership

CS 4803 / 7643: Deep Learning

Agnostic Learning with Ensembles of Classifiers

Closing Remarks: What can we do with multiple diverse solutions?

On 21 September 2014, Alexej Chervonenkis went for a walk in a park on the outskirts of Moscow and got lost. He called his wife in the evening, and

Curriculum Guide for Pre-Algebra

6.041SC Probabilistic Systems Analysis and Applied Probability, Fall 2013 Transcript Lecture 21

Question Answering. CS486 / 686 University of Waterloo Lecture 23: April 1 st, CS486/686 Slides (c) 2014 P. Poupart 1

AUTHORSHIP DISCRIMINATION ON QURAN AND HADITH USING DISCRIMINATIVE LEAVE-ONE-OUT CLASSIFICATION

QUESTION ANSWERING SYSTEM USING SIMILARITY AND CLASSIFICATION TECHNIQUES

Lesson 07 Notes. Machine Learning. Quiz: Computational Learning Theory

Artificial Intelligence Prof. Deepak Khemani Department of Computer Science and Engineering Indian Institute of Technology, Madras

Allreduce for Parallel Learning. John Langford, Microsoft Resarch, NYC

Radiomics for Disease Characterization: An Outcome Prediction in Cancer Patients

Artificial Intelligence Prof. Deepak Khemani Department of Computer Science and Engineering Indian Institute of Technology, Madras

Information Science and Statistics. Series Editors: M. Jordan J. Kleinberg B. Schölkopf

Laboratory Exercise Saratoga Springs Temple Site Locator

MORAL PARTICULARISM AND TRANSDUCTION. Gilbert Harman Princeton University

Grade 6 correlated to Illinois Learning Standards for Mathematics

Agnostic KWIK learning and efficient approximate reinforcement learning

Introduction to Statistical Hypothesis Testing Prof. Arun K Tangirala Department of Chemical Engineering Indian Institute of Technology, Madras

McDougal Littell High School Math Program. correlated to. Oregon Mathematics Grade-Level Standards

Gödel's incompleteness theorems

From Machines To The First Person

Computational Learning Theory: Agnostic Learning

What Is On The Final. Review. What Is Not On The Final. What Might Be On The Final

Coreference Resolution Lecture 15: October 30, Reference Resolution

A Cover Page. Classification of Jewish Law Articles According to the Ethnic Group of their Writers Using Stems

How many imputations do you need? A two stage calculation using a quadratic rule

Supplement to: Aksoy, Ozan Motherhood, Sex of the Offspring, and Religious Signaling. Sociological Science 4:

MITOCW watch?v=4hrhg4euimo

This is a relatively new term used by those in things like Speech recognition software development or robotic engineering or the internet searches.

A Scientific Model Explains Spirituality and Nonduality

Building age models is hard 12/12/17. Ar#ficial Intelligence. An artificial intelligence tool for complex age-depth models

Document-level context in deep recurrent neural networks

Topological Distance Between Nonplanar Transportation Networks

MITOCW watch?v=k2sc-wpdt6k

Lesson 10 Notes. Machine Learning. Intro. Joint Distribution

Outline. Uninformed Search. Problem-solving by searching. Requirements for searching. Problem-solving by searching Uninformed search techniques

AMERICAN SECULARISM CULTUR AL CONTOURS OF NONRELIGIOUS BELIEF SYSTEMS. Joseph O. Baker & Buster G. Smith

The Impact of Oath Writing Style on Stylometric Features and Machine Learning Classifiers

TEXT MINING TECHNIQUES RORY DUTHIE

Information Retrieval LIS 544 IMT 542 INSC 544

The Evolution of Cognitive and Noncognitive Skills Over the Life Cycle of the Child

Quorums. Christian Plattner, Gustavo Alonso Exercises for Verteilte Systeme WS05/06 Swiss Federal Institute of Technology (ETH), Zürich

Math Matters: Why Do I Need To Know This? 1 Logic Understanding the English language

Artificial Intelligence: Valid Arguments and Proof Systems. Prof. Deepak Khemani. Department of Computer Science and Engineering

ITU Kaleidoscope 2016 ICTs for a Sustainable World

Sorting: Merge Sort. College of Computing & Information Technology King Abdulaziz University. CPCS-204 Data Structures I

CS224W Project Proposal: Characterizing and Predicting Dogmatic Networks

NPTEL NPTEL ONLINE COURSES REINFORCEMENT LEARNING. UCB1 Explanation (UCB1)

Identity and Curriculum in Catholic Education

Six Sigma Prof. Dr. T. P. Bagchi Department of Management Indian Institute of Technology, Kharagpur

Lesson 09 Notes. Machine Learning. Intro

Privacy: more than meets the eye. Daniel Kifer (Penn State University)

Gesture recognition with Kinect. Joakim Larsson

Boosting. D. Blei Interacting with Data 1 / 15

Intelligent Agent for Information Extraction from Arabic Text without Machine Translation

MISSOURI S FRAMEWORK FOR CURRICULAR DEVELOPMENT IN MATH TOPIC I: PROBLEM SOLVING

Functionalism and the Chinese Room. Minds as Programs

The Self and Other Minds

INF5020 Philosophy of Information: Ontology

Introduction to Quantitative Finance

1/17/2018 ECE 313. Probability with Engineering Applications Section B Y. Lu. ECE 313 is quite a bit different from your other engineering courses.

Mathematics as we know it has been created and used by

Can machines think? Machines, who think. Are we machines? If so, then machines can think too. We compute since 1651.

Visual Analytics Based Authorship Discrimination Using Gaussian Mixture Models and Self Organising Maps: Application on Quran and Hadith

Religion Poll. 03/11/2014 Prepared on behalf of The Huffington Post

PHIL 155: The Scientific Method, Part 1: Naïve Inductivism. January 14, 2013

Summary. Ivo D. Dinov 2018 I. D. Dinov, Data Science and Predictive Analytics,

Carolina Bachenheimer-Schaefer, Thorsten Reibel, Jürgen Schilder & Ilija Zivadinovic Global Application and Solution Team

logic is everywhere Logik ist überall Hikmat har Jaga Hai Mantık her yerde la logica è dappertutto lógica está em toda parte

PHILOSOPHIES OF SCIENTIFIC TESTING

6.080 / Great Ideas in Theoretical Computer Science Spring 2008

Syllogism. Exam Importance Exam Importance. CAT Very Important IBPS/Bank PO Very Important. XAT Very Important BANK Clerk Very Important

MITOCW watch?v=ogo1gpxsuzu

Handling vagueness in logic, via algebras and games. Lecture 1.

That's Your Evidence?: Using Mechanical Turk To Develop A Computational Account Of Debate And Argumentation In Online Forums

On the hard problem of consciousness: Why is physics not enough?

The Decline of the Traditional Church Choir: The Impact on the Church and Society. Dr Arthur Saunders

APRIL 2017 KNX DALI-Gateways DG/S x BU EPBP GPG Building Automation. Thorsten Reibel, Training & Qualification

MITOCW watch?v=iozvbilaizc

About QF101 Overview Careers for Quants Pre-U Math Takeaways. Introduction. Christopher Ting.

TÜ Information Retrieval

I also occasionally write for the Huffington Post: knoll/

It has been stated that stories told by an oral culture have

Smith Waterman Algorithm - Performance Analysis

1. Introduction Formal deductive logic Overview

I thought I should expand this population approach somewhat: P t = P0e is the equation which describes population growth.

Let the Light of Christ Shine

Foundationalism Vs. Skepticism: The Greater Philosophical Ideology

A Question Answering System on Holy Quran Translation Based on Question Expansion Technique and Neural Network Classification

Transcription:

ECE 5984: Introduction to Machine Learning Topics: SVM Multi-class SVMs Neural Networks Multi-layer Perceptron Readings: Barber 17.5, Murphy 16.5 Dhruv Batra Virginia Tech

HW2 Graded Mean 66/61 = 108% Min: 47 Max: 75 (C) Dhruv Batra 2

Administrativia HW3 Due: in 2 weeks You will implement primal & dual SVMs Kaggle competition: Higgs Boson Signal vs Background classification https://inclass.kaggle.com/c/2015-spring-vt-ece-machinelearning-hw3 https://www.kaggle.com/c/higgs-boson (C) Dhruv Batra 3

Administrativia (C) Dhruv Batra 4

Administrativia Project Mid-Sem Spotlight Presentations Friday: 5-7pm, Whittemore 654 5 slides (recommended) 4 minute time (STRICT) + 1-2 min Q&A Tell the class what you re working on Any results yet? Problems faced? Upload slides on Scholar (C) Dhruv Batra 5

Recap of Last Time (C) Dhruv Batra 6

Linear classifiers Which line is better? w.x = j w (j) x (j) 7

Dual SVM derivation (1) the linearly separable case (C) Dhruv Batra Slide Credit: Carlos Guestrin 8

Dual SVM derivation (1) the linearly separable case (C) Dhruv Batra Slide Credit: Carlos Guestrin 9

Dual SVM formulation the linearly separable case (C) Dhruv Batra Slide Credit: Carlos Guestrin 10

Dual SVM formulation the non-separable case (C) Dhruv Batra Slide Credit: Carlos Guestrin 11

Dual SVM formulation the non-separable case (C) Dhruv Batra Slide Credit: Carlos Guestrin 12

Why did we learn about the dual SVM? Builds character! Exposes structure about the problem There are some quadratic programming algorithms that can solve the dual faster than the primal The kernel trick!!! (C) Dhruv Batra Slide Credit: Carlos Guestrin 13

Dual SVM interpretation: Sparsity w.x + b = +1 w.x + b = 0 w.x + b = -1 margin 2γ (C) Dhruv Batra Slide Credit: Carlos Guestrin 14

Dual formulation only depends on dot-products, not on w! (C) Dhruv Batra 15

Polynomials of degree d Common kernels Polynomials of degree up to d Gaussian kernel / Radial Basis Function 2 Sigmoid (C) Dhruv Batra Slide Credit: Carlos Guestrin 16

Plan for Today SVMs Multi-class Neural Networks (C) Dhruv Batra 17

What about multiple classes? (C) Dhruv Batra Slide Credit: Carlos Guestrin 18

One against All (Rest) y2 Not y2 Learn N classifiers: y1 Not y1 Noty3 y3 (C) Dhruv Batra Slide Credit: Carlos Guestrin 19

One against One y2 y1 y1 Learn N-choose-2 classifiers: y3 y2 y3 (C) Dhruv Batra Slide Credit: Carlos Guestrin 20

Problems C 1 C 3? R 1 R 1 R 2? R 3 C 1 C 2 C 1 R 3 C 2 R 2 Not C 1 Not C 2 C 2 C 3 (C) Dhruv Batra Image Credit: Kevin Murphy 21

Learn 1 classifier: Multiclass SVM Simultaneously learn 3 sets of weights (C) Dhruv Batra Slide Credit: Carlos Guestrin 22

Learn 1 classifier: Multiclass SVM (C) Dhruv Batra Slide Credit: Carlos Guestrin 23

Not linearly separable data Some datasets are not linearly separable! http://www.eee.metu.edu.tr/~alatan/courses/demo/ AppletSVM.html

Addressing non-linearly separable data Option 1, non-linear features Choose non-linear features, e.g., Typical linear features: w 0 + i w i x i Example of non-linear features: Degree 2 polynomials, w 0 + i w i x i + ij w ij x i x j Classifier h w (x) still linear in parameters w As easy to learn Data is linearly separable in higher dimensional spaces Express via kernels (C) Dhruv Batra Slide Credit: Carlos Guestrin 25

Addressing non-linearly separable data Option 2, non-linear classifier Choose a classifier h w (x) that is non-linear in parameters w, e.g., Decision trees, neural networks, More general than linear classifiers But, can often be harder to learn (non-convex optimization required) Often very useful (outperforms linear classifiers) In a way, both ideas are related (C) Dhruv Batra Slide Credit: Carlos Guestrin 26

New Topic: Neural Networks (C) Dhruv Batra 27

Synonyms Neural Networks Artificial Neural Network (ANN) Feed-forward Networks Multilayer Perceptrons (MLP) Types of ANN Convolutional Nets Autoencoders Recurrent Neural Nets [Back with a new name]: Deep Nets / Deep Learning (C) Dhruv Batra 28

Biological Neuron (C) Dhruv Batra 29

Artificial Neuron Perceptron (with step function) Logistic Regression (with sigmoid) (C) Dhruv Batra 30

Sigmoid w 0 =2, w 1 =1 w 0 =0, w 1 =1 w 0 =0, w 1 =0.5 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0-6 -4-2 0 2 4 6 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0-6 -4-2 0 2 4 6 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0-6 -4-2 0 2 4 6 (C) Dhruv Batra Slide Credit: Carlos Guestrin 31

Many possible response functions Linear Sigmoid Exponential Gaussian

Limitation A single neuron is still a linear decision boundary What to do? (C) Dhruv Batra 33

(C) Dhruv Batra 34

Limitation A single neuron is still a linear decision boundary What to do? Idea: Stack a bunch of them together! (C) Dhruv Batra 35

Hidden layer 1-hidden layer (or 3-layer network): On board (C) Dhruv Batra 36

Neural Nets Best performers on OCR http://yann.lecun.com/exdb/lenet/index.html NetTalk Text to Speech system from 1987 http://youtu.be/txmafho6diy?t=45m15s Rick Rashid speaks Mandarin http://youtu.be/nu-nlqqfckg?t=7m30s (C) Dhruv Batra 37

Universal Function Approximators Theorem 3-layer network with linear outputs can uniformly approximate any continuous function to arbitrary accuracy, given enough hidden units [Funahashi 89] (C) Dhruv Batra 38

Neural Networks Demo http://neuron.eng.wayne.edu/bpfunctionapprox/ bpfunctionapprox.html (C) Dhruv Batra 39