ECE 5424: Introduction to Machine Learning

Similar documents
ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning

ECE 6504: Deep Learning for Perception

Computational Learning Theory: Agnostic Learning

NPTEL NPTEL ONINE CERTIFICATION COURSE. Introduction to Machine Learning. Lecture-59 Ensemble Methods- Bagging,Committee Machines and Stacking

NPTEL NPTEL ONLINE CERTIFICATION COURSE. Introduction to Machine Learning. Lecture 31

Agnostic Learning with Ensembles of Classifiers

Introduction to Statistical Hypothesis Testing Prof. Arun K Tangirala Department of Chemical Engineering Indian Institute of Technology, Madras

6.041SC Probabilistic Systems Analysis and Applied Probability, Fall 2013 Transcript Lecture 21

Deep Neural Networks [GBC] Chap. 6, 7, 8. CS 486/686 University of Waterloo Lecture 18: June 28, 2017

Agnostic KWIK learning and efficient approximate reinforcement learning

Module 02 Lecture - 10 Inferential Statistics Single Sample Tests

Radiomics for Disease Characterization: An Outcome Prediction in Cancer Patients

CS485/685 Lecture 5: Jan 19, 2016

Closing Remarks: What can we do with multiple diverse solutions?

Scientific Realism and Empiricism

Lesson 07 Notes. Machine Learning. Quiz: Computational Learning Theory

How many imputations do you need? A two stage calculation using a quadratic rule

MITOCW watch?v=4hrhg4euimo

Sociology Exam 1 Answer Key February 18, 2011

POLS 205 Political Science as a Social Science. Making Inferences from Samples

Discussion Notes for Bayesian Reasoning

INTRODUCTION TO HYPOTHESIS TESTING. Unit 4A - Statistical Inference Part 1

MITOCW watch?v=ogo1gpxsuzu

Brandeis University Maurice and Marilyn Cohen Center for Modern Jewish Studies

Module - 02 Lecturer - 09 Inferential Statistics - Motivation

MITOCW watch?v=k2sc-wpdt6k

Same-different and A-not A tests with sensr. Same-Different and the Degree-of-Difference tests. Outline. Christine Borgen Linander

Six Sigma Prof. Dr. T. P. Bagchi Department of Management Indian Institute of Technology, Kharagpur

Supplement to: Aksoy, Ozan Motherhood, Sex of the Offspring, and Religious Signaling. Sociological Science 4:

McDougal Littell High School Math Program. correlated to. Oregon Mathematics Grade-Level Standards

The Decline of the Traditional Church Choir: The Impact on the Church and Society. Dr Arthur Saunders

CS 4803 / 7643: Deep Learning

The following content is provided under a Creative Commons license. Your support

Using Machine Learning Algorithms for Categorizing Quranic Chapters by Major Phases of Prophet Mohammad s Messengership

It is One Tailed F-test since the variance of treatment is expected to be large if the null hypothesis is rejected.

Statistics, Politics, and Policy

Biometrics Prof. Phalguni Gupta Department of Computer Science and Engineering Indian Institute of Technology, Kanpur. Lecture No.

Introduction to Inference

CS224W Project Proposal: Characterizing and Predicting Dogmatic Networks

1/17/2018 ECE 313. Probability with Engineering Applications Section B Y. Lu. ECE 313 is quite a bit different from your other engineering courses.

NPTEL NPTEL ONLINE COURSES REINFORCEMENT LEARNING. UCB1 Explanation (UCB1)

Grade 6 correlated to Illinois Learning Standards for Mathematics

PHILOSOPHIES OF SCIENTIFIC TESTING

Deconstructing Data Science

On 21 September 2014, Alexej Chervonenkis went for a walk in a park on the outskirts of Moscow and got lost. He called his wife in the evening, and

Outline of today s lecture

Outline. Uninformed Search. Problem-solving by searching. Requirements for searching. Problem-solving by searching Uninformed search techniques

A Linear Programming Approach to Complex Games: An Application to Nuclear Exchange Models

ABSTRACT. Religion and Economic Growth: An Analysis at the City Level. Ran Duan, M.S.Eco. Mentor: Lourenço S. Paz, Ph.D.

The end of the world & living in a computer simulation

Session 10 INDUCTIVE REASONONING IN THE SCIENCES & EVERYDAY LIFE( PART 1)

Content Area Variations of Academic Language

6.041SC Probabilistic Systems Analysis and Applied Probability, Fall 2013 Transcript Lecture 3

TÜ Information Retrieval

The World Wide Web and the U.S. Political News Market: Online Appendices

Chapter 20 Testing Hypotheses for Proportions

Family Studies Center Methods Workshop

Logical (formal) fallacies

Lesson 10 Notes. Machine Learning. Intro. Joint Distribution

MITOCW watch?v=iozvbilaizc

Studying Adaptive Learning Efficacy using Propensity Score Matching

Who wrote the Letter to the Hebrews? Data mining for detection of text authorship

Gesture recognition with Kinect. Joakim Larsson

Statistical Inference Without Frequentist Justifications

The New Paradigm and Mental Models

Question Answering. CS486 / 686 University of Waterloo Lecture 23: April 1 st, CS486/686 Slides (c) 2014 P. Poupart 1

A Layperson s Guide to Hypothesis Testing By Michael Reames and Gabriel Kemeny ProcessGPS

What Is On The Final. Review. What Is Not On The Final. What Might Be On The Final

The argument from so many arguments

Conditional Probability, Hypothesis Testing, and the Monty Hall Problem

Marcello Pagano [JOTTER WEEK 5 SAMPLING DISTRIBUTIONS ] Central Limit Theorem, Confidence Intervals and Hypothesis Testing

Philosophy 12 Study Guide #4 Ch. 2, Sections IV.iii VI

MISSOURI S FRAMEWORK FOR CURRICULAR DEVELOPMENT IN MATH TOPIC I: PROBLEM SOLVING

AMERICAN SECULARISM CULTUR AL CONTOURS OF NONRELIGIOUS BELIEF SYSTEMS. Joseph O. Baker & Buster G. Smith

=EQUALS= Center for. A Club of Investigation and Discovery. Published by: autosocratic PRESS Copyright 2011 Michael Lee Round

Torah Code Cluster Probabilities

Lesson 09 Notes. Machine Learning. Intro

Social Perception Survey. Do people make prejudices based on appearance/stereotypes? We used photos as a bias to test this.

Argumentation Module: Philosophy Lesson 7 What do we mean by argument? (Two meanings for the word.) A quarrel or a dispute, expressing a difference

Artificial Intelligence: Valid Arguments and Proof Systems. Prof. Deepak Khemani. Department of Computer Science and Engineering

Identifying Anaphoric and Non- Anaphoric Noun Phrases to Improve Coreference Resolution

Lecture 6 Workable Ethical Theories I. Based on slides 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley

Netherlands Interdisciplinary Demographic Institute, The Hague, The Netherlands

Boosting. D. Blei Interacting with Data 1 / 15

Aquinas Cosmological argument in everyday language

Discussion of "Regime Switches, Agents Beliefs, and Post-WW II U.S. Macro Dynamics" by Francesco Bianchi

The numbers of single adults practising Christian worship

Philosophy 148 Announcements & Such. Inverse Probability and Bayes s Theorem II. Inverse Probability and Bayes s Theorem III

Statistics for Experimentalists Prof. Kannan. A Department of Chemical Engineering Indian Institute of Technology - Madras

About QF101 Overview Careers for Quants Pre-U Math Takeaways. Introduction. Christopher Ting.

Other Logics: What Nonclassical Reasoning Is All About Dr. Michael A. Covington Associate Director Artificial Intelligence Center

Rational and Irrational Numbers 2

Lampiran 1. Daftar Sampel Reksa dana campuran syariah

Introduction: Belief vs Degrees of Belief

Allreduce for Parallel Learning. John Langford, Microsoft Resarch, NYC

Lecture 9. A summary of scientific methods Realism and Anti-realism

Pinker versus Taleb: A Non-deadly Quarrel over the Decline of Violence

Transcription:

ECE 5424: Introduction to Machine Learning Topics: (Finish) Model selection Error decomposition Bias-Variance Tradeoff Classification: Naïve Bayes Readings: Barber 17.1, 17.2, 10.1-10.3 Stefan Lee Virginia Tech

Administrativia HW2 Due: Wed 09/28, 11:55pm Implement linear regression, Naïve Bayes, Logistic Regression Project Proposal due tomorrow by 11:55pm!! (C) Dhruv Batra 2

Recap of last time (C) Dhruv Batra 3

Regression (C) Dhruv Batra 4

(C) Dhruv Batra Slide Credit: Greg Shakhnarovich 5

(C) Dhruv Batra Slide Credit: Greg Shakhnarovich 6

(C) Dhruv Batra Slide Credit: Greg Shakhnarovich 7

What you need to know Linear Regression Model Least Squares Objective Connections to Max Likelihood with Gaussian Conditional Robust regression with Laplacian Likelihood Ridge Regression with priors Polynomial and General Additive Regression (C) Dhruv Batra 8

Plan for Today (Finish) Model Selection Overfitting vs Underfitting Bias-Variance trade-off aka Modeling error vs Estimation error tradeoff Naïve Bayes (C) Dhruv Batra 9

New Topic: Model Selection and Error Decomposition (C) Dhruv Batra 10

Model Selection How do we pick the right model class? Similar questions How do I pick magic hyper-parameters? How do I do feature selection? (C) Dhruv Batra 11

Errors Expected Loss/Error Training Loss/Error Validation Loss/Error Test Loss/Error Reporting Training Error (instead of Test) is CHEATING Optimizing parameters on Test Error is CHEATING (C) Dhruv Batra 12

(C) Dhruv Batra Slide Credit: Greg Shakhnarovich 13

(C) Dhruv Batra Slide Credit: Greg Shakhnarovich 14

(C) Dhruv Batra Slide Credit: Greg Shakhnarovich 15

(C) Dhruv Batra Slide Credit: Greg Shakhnarovich 16

(C) Dhruv Batra Slide Credit: Greg Shakhnarovich 17

Overfitting Overfitting: a learning algorithm overfits the training data if it outputs a solution w when there exists another solution w such that: (C) Dhruv Batra Slide Credit: Carlos Guestrin 18

Error Decomposition Reality (C) Dhruv Batra 19

Error Decomposition Reality (C) Dhruv Batra 20

Error Decomposition Reality Higher-Order Potentials (C) Dhruv Batra 21

Error Decomposition Approximation/Modeling Error You approximated reality with model Estimation Error You tried to learn model with finite data Optimization Error You were lazy and couldn t/didn t optimize to completion Bayes Error Reality just sucks (i.e. there is a lower bound on error for all models, usually non-zero) (C) Dhruv Batra 22

Bias-Variance Tradeoff Bias: difference between what you expect to learn and truth Measures how well you expect to represent true solution Decreases with more complex model Variance: difference between what you expect to learn and what you learn from a from a particular dataset Measures how sensitive learner is to specific dataset Increases with more complex model (C) Dhruv Batra Slide Credit: Carlos Guestrin 23

Bias-Variance Tradeoff Matlab demo (C) Dhruv Batra 24

Bias-Variance Tradeoff Choice of hypothesis class introduces learning bias More complex class less bias More complex class more variance (C) Dhruv Batra Slide Credit: Carlos Guestrin 25

(C) Dhruv Batra Slide Credit: Greg Shakhnarovich 26

Learning Curves Error vs size of dataset On board High-bias curves High-variance curves (C) Dhruv Batra 27

Debugging Machine Learning My algorithm doesn t work High test error What should I do? More training data Smaller set of features Larger set of features Lower regularization Higher regularization (C) Dhruv Batra 28

What you need to know Generalization Error Decomposition Approximation, estimation, optimization, bayes error For squared losses, bias-variance tradeoff Errors Difference between train & test error & expected error Cross-validation (and cross-val error) NEVER EVER learn on test data Overfitting vs Underfitting (C) Dhruv Batra 29

New Topic: Naïve Bayes (your first probabilistic classifier) x Classification y Discrete (C) Dhruv Batra 30

Learn: h:x! Y X features Y target classes Classification Suppose you know P(Y X) exactly, how should you classify? Bayes classifier: Why? Slide Credit: Carlos Guestrin

Optimal classification Theorem: Bayes classifier h Bayes is optimal! That is Proof: Slide Credit: Carlos Guestrin

Generative vs. Discriminative Generative Approach Estimate p(x y) and p(y) Use Bayes Rule to predict y Discriminative Approach Estimate p(y x) directly OR Learn discriminant function h(x) (C) Dhruv Batra 33

Generative vs. Discriminative Generative Approach Assume some functional form for P(X Y), P(Y) Estimate p(x Y) and p(y) Use Bayes Rule to calculate P(Y X=x) Indirect computation of P(Y X) through Bayes rule But, can generate a sample, P(X) = y P(y) P(X y) Discriminative Approach Estimate p(y x) directly OR Learn discriminant function h(x) Direct but cannot obtain a sample of the data, because P(X) is not available (C) Dhruv Batra 34

Generative vs. Discriminative Generative: Today: Naïve Bayes Discriminative: Next: Logistic Regression NB & LR related to each other. (C) Dhruv Batra 35

How hard is it to learn the optimal classifier? Categorical Data How do we represent these? How many parameters? Class-Prior, P(Y): Suppose Y is composed of k classes Likelihood, P(X Y): Suppose X is composed of d binary features Complex model à High variance with limited data!!! Slide Credit: Carlos Guestrin

Independence to the rescue (C) Dhruv Batra Slide Credit: Sam Roweis 37

The Naïve Bayes assumption Naïve Bayes assumption: Features are independent given class: More generally: d How many parameters now? Suppose X is composed of d binary features (C) Dhruv Batra Slide Credit: Carlos Guestrin 38

The Naïve Bayes Classifier Given: Class-Prior P(Y) d conditionally independent features X given the class Y For each X i, we have likelihood P(X i Y) Decision rule: If assumption holds, NB is optimal classifier! (C) Dhruv Batra Slide Credit: Carlos Guestrin 39