ECE 5424: Introduction to Machine Learning

Similar documents
ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning

CS485/685 Lecture 5: Jan 19, 2016

ECE 5984: Introduction to Machine Learning

6.041SC Probabilistic Systems Analysis and Applied Probability, Fall 2013 Transcript Lecture 21

ECE 5424: Introduction to Machine Learning

Introduction to Statistical Hypothesis Testing Prof. Arun K Tangirala Department of Chemical Engineering Indian Institute of Technology, Madras

NPTEL NPTEL ONINE CERTIFICATION COURSE. Introduction to Machine Learning. Lecture-59 Ensemble Methods- Bagging,Committee Machines and Stacking

Statistical Inference Without Frequentist Justifications

MISSOURI S FRAMEWORK FOR CURRICULAR DEVELOPMENT IN MATH TOPIC I: PROBLEM SOLVING

Other Logics: What Nonclassical Reasoning Is All About Dr. Michael A. Covington Associate Director Artificial Intelligence Center

Outline. Uninformed Search. Problem-solving by searching. Requirements for searching. Problem-solving by searching Uninformed search techniques

Discussion Notes for Bayesian Reasoning

Artificial Intelligence. Clause Form and The Resolution Rule. Prof. Deepak Khemani. Department of Computer Science and Engineering

Computational Learning Theory: Agnostic Learning

Philosophy Epistemology Topic 5 The Justification of Induction 1. Hume s Skeptical Challenge to Induction

Reasoning and Decision-Making under Uncertainty

CSSS/SOC/STAT 321 Case-Based Statistics I. Introduction to Probability

Closing Remarks: What can we do with multiple diverse solutions?

Lesson 09 Notes. Machine Learning. Intro

Lesson 10 Notes. Machine Learning. Intro. Joint Distribution

Detachment, Probability, and Maximum Likelihood

Some basic statistical tools. ABDBM Ron Shamir

CS 4803 / 7643: Deep Learning

A Scientific Realism-Based Probabilistic Approach to Popper's Problem of Confirmation

Is Epistemic Probability Pascalian?

INTRODUCTION TO HYPOTHESIS TESTING. Unit 4A - Statistical Inference Part 1

6.041SC Probabilistic Systems Analysis and Applied Probability, Fall 2013 Transcript Lecture 3

Some questions about Adams conditionals

Course Assignment Descriptions and Schedule At-A-Glance

Module - 02 Lecturer - 09 Inferential Statistics - Motivation

MITOCW watch?v=ogo1gpxsuzu

Sins of the Epistemic Probabilist Exchanges with Peter Achinstein

YouGov June 13-14, US Adults

The following content is provided under a Creative Commons license. Your support

Review Articles THE LOGIC OF STATISTICAL INFERENCE1

Biometrics Prof. Phalguni Gupta Department of Computer Science and Engineering Indian Institute of Technology, Kanpur. Lecture No.

History of Probability and Statistics in the 18th Century. Deirdre Johnson, Jessica Gattoni, Alex Gangi

POLS 205 Political Science as a Social Science. Making Inferences from Samples

Probability Foundations for Electrical Engineers Prof. Krishna Jagannathan Department of Electrical Engineering Indian Institute of Technology, Madras

I also occasionally write for the Huffington Post: knoll/

Module 02 Lecture - 10 Inferential Statistics Single Sample Tests

Understanding irrational numbers by means of their representation as non-repeating decimals

The Birthday Problem

Torah Code Cluster Probabilities

Lecture 9. A summary of scientific methods Realism and Anti-realism

This report is organized in four sections. The first section discusses the sample design. The next

The SAT Essay: An Argument-Centered Strategy

How many imputations do you need? A two stage calculation using a quadratic rule

Actuaries Institute Podcast Transcript Ethics Beyond Human Behaviour

SEVENTH GRADE RELIGION

For a thorough account of Boole s life and works, see MacHale (1985, reprinted in 2014). 2

Logical Induc-on. Sco8 Garrabrant, Tsvi Benson-Tilsen, Andrew Critch Nate Soares, Jessica Taylor. (sco8 tsvi critch nate

UNIT 3 MODULE 5 PROBABILITIES INVOLVING NEGATIONS, DISJUNCTIONS, and CONDITIONAL PROBABILITY

Explanationist Aid for the Theory of Inductive Logic

Introduction to Inference

Lazy Functional Programming for a survey

DOWNLOAD OR READ : UNDERSTANDING BASIC SAINT AND STUDENT STUDY GUIDE AND TECHNOLOGY GUIDE AND MACINTOSH FORMAT PDF EBOOK EPUB MOBI

Artificial Intelligence Prof. P. Dasgupta Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

Syllabus for THE 470 Philosophy of Religion 3.0 Credit Hours Fall The major goals are to enable the student to do the following:

Brandeis University Maurice and Marilyn Cohen Center for Modern Jewish Studies

15. Russell on definite descriptions

Curriculum Guide for Pre-Algebra

Same-different and A-not A tests with sensr. Same-Different and the Degree-of-Difference tests. Outline. Christine Borgen Linander

CD 511 The Pastor and Christian Discipleship

Artificial Intelligence Prof. P. Dasgupta Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

2.1 Review. 2.2 Inference and justifications

Gesture recognition with Kinect. Joakim Larsson

NPTEL NPTEL ONLINE COURSES REINFORCEMENT LEARNING. UCB1 Explanation (UCB1)

Russell on Denoting. G. J. Mattey. Fall, 2005 / Philosophy 156. The concept any finite number is not odd, nor is it even.

Six Sigma Prof. Dr. T. P. Bagchi Department of Management Indian Institute of Technology, Kharagpur

Conditional Probability, Hypothesis Testing, and the Monty Hall Problem

Certainty, probability and abduction: why we should look to C.S. Peirce rather than GoÈ del for a theory of clinical reasoning

Deconstructing Data Science

What Is On The Final. Review. What Is Not On The Final. What Might Be On The Final

MLLunsford, Spring Activity: Conditional Probability and The Law of Total Probability

Slides by: Ms. Shree Jaswal

Priesthood Restoration Site Visitor Center Water Systems

Higher National Unit Specification. General information for centres. Unit title: Philosophy C: An Introduction to Analytic Philosophy

Studying Religion-Associated Variations in Physicians Clinical Decisions: Theoretical Rationale and Methodological Roadmap

MITOCW watch?v=4hrhg4euimo

Renown Conference PH 115 J CT Principles of Christian

Grade 7 Math Connects Suggested Course Outline for Schooling at Home 132 lessons

The American Presidency Requirements: Grading:

Radiomics for Disease Characterization: An Outcome Prediction in Cancer Patients

August Parish Life Survey. Saint Benedict Parish Johnstown, Pennsylvania

Scientific Realism and Empiricism

Supplement to: Aksoy, Ozan Motherhood, Sex of the Offspring, and Religious Signaling. Sociological Science 4:

Georgia Quality Core Curriculum

Chapter 20 Testing Hypotheses for Proportions

ECE 6504: Deep Learning for Perception

Question Answering. CS486 / 686 University of Waterloo Lecture 23: April 1 st, CS486/686 Slides (c) 2014 P. Poupart 1

Sufficient Reason and Infinite Regress: Causal Consistency in Descartes and Spinoza. Ryan Steed

Knowledge, Trade-Offs, and Tracking Truth

It is One Tailed F-test since the variance of treatment is expected to be large if the null hypothesis is rejected.

Classroom Voting Questions: Statistics

Rational Self-Doubt: The Re-calibrating Bayesian

McDougal Littell High School Math Program. correlated to. Oregon Mathematics Grade-Level Standards

EMBARGOED FOR RELEASE: Thursday, Sept. 8 at 4:00 p.m.

Six Sigma Prof. Dr. T. P. Bagchi Department of Management Indian Institute of Technology, Kharagpur. Lecture No. # 18 Acceptance Sampling

Transcription:

ECE 5424: Introduction to Machine Learning Topics: Probability Review Readings: Barber 8.1, 8.2 Stefan Lee Virginia Tech

Project Groups of 1-3 we prefer teams of 2 Deliverables: Project proposal (NIPS format): 2 page, due Sept 21 Midway presentations (in class) Final report: webpage with results (C) Dhruv Batra 2

Administrative HW1 Due on Wed 09/14, 11:55pm https://inclass.kaggle.com/c/vt-ece-introduction-to-machinelearning-hw-1 Project Proposal Due: Wed 09/21, 11:55 pm <=2pages, NIPS format (C) Dhruv Batra 3

Proposal 2 Page (NIPS format) https://nips.cc/conferences/2015/paperinformation/stylefiles Necessary Information: Project title Project idea. This should be approximately two paragraphs. Data set details Ideally existing dataset. No data-collection projects. Software Which libraries will you use? What will you write? Papers to read. Include 1-3 relevant papers. You will probably want to read at least one of them before submitting your proposal. Teammate Will you have a teammate? If so, what s the break-down of labor? Maximum team size is 3 students. Mid-semester Milestone What will you complete by the project milestone due date? Experimental results of some kind are expected here. (C) Dhruv Batra 4

Project Rules Must be about machine learning Must involve real data Use your own data or take from class website Can apply ML to your own research. Must be done this semester. OK to combine with other class-projects Must declare to both course instructors Must have explicit permission from BOTH instructors Must have a sufficient ML component Using libraries No need to implement all algorithms OK to use standard SVM, MRF, Decision-Trees, etc libraries More thought + effort => More credit (C) Dhruv Batra 5

Project Main categories Application/Survey Compare a bunch of existing algorithms on a new application domain of your interest Formulation/Development Formulate a new model or algorithm for a new or old problem Theory Theoretically analyze an existing algorithm Support List of ideas, pointers to dataset/algorithms/code https://filebox.ece.vt.edu/~f16ece5424/project.html We will mentor teams and give feedback. (C) Dhruv Batra 6

Procedural View Training Stage: Raw Data à x (Feature Extraction) Training Data { (x,y) } à f (Learning) Testing Stage Raw Data à x (Feature Extraction) Test Data x à f(x) (Apply function, Evaluate error) (C) Dhruv Batra 7

Statistical Estimation View Probabilities to rescue: x and y are random variables D = (x 1,y 1 ), (x 2,y 2 ),, (x N,y N ) ~ P(X,Y) IID: Independent Identically Distributed Both training & testing data sampled IID from P(X,Y) Learn on training set Have some hope of generalizing to test set (C) Dhruv Batra 8

Plan for Today Review of Probability Discrete vs Continuous Random Variables PMFs vs PDF Joint vs Marginal vs Conditional Distributions Bayes Rule and Prior Expectation, Entropy, KL-Divergence (C) Dhruv Batra 9

Probability The world is a very uncertain place 30 years of Artificial Intelligence and Database research danced around this fact And then a few AI researchers decided to use some ideas from the eighteenth century (C) Dhruv Batra Slide Credit: Andrew Moore 10

Probability A is non-deterministic event Can think of A as a boolean-valued variable Examples A = your next patient has cancer A = Donald Trump Wins the 2016 Presidential Election (C) Dhruv Batra 11

Interpreting Probabilities What does P(A) mean? Frequentist View limit Nà #(A is true)/n limiting frequency of a repeating non-deterministic event Bayesian View P(A) is your belief about A Market Design View P(A) tells you how much you would bet (C) Dhruv Batra 12

(C) Dhruv Batra Image Credit: Intrade / NPR 13

The Axioms Of Probabi lity 7 (C) Dhruv Batra Slide Credit: Andrew Moore 14

Axioms of Probability 0<= P(A) <= 1 P(empty-set) = 0 P(everything) = 1 P(A or B) = P(A) + P(B) P(A and B) (C) Dhruv Batra 15

Interpreting the Axioms 0<= P(A) <= 1 P(empty-set) = 0 P(everything) = 1 P(A or B) = P(A) + P(B) P(A and B) Event space of all possible worlds Worlds in which A is true P(A) = Area of reddish oval Its area is 1 Worlds in which A is False (C) Dhruv Batra Image Credit: Andrew Moore 16

Interpreting the Axioms 0<= P(A) <= 1 P(empty-set) = 0 P(everything) = 1 P(A or B) = P(A) + P(B) P(A and B) The area of A candt get any smaller than 0 And a zero area would mean no world could ever have A true (C) Dhruv Batra Image Credit: Andrew Moore 17

Interpreting the Axioms 0<= P(A) <= 1 P(empty-set) = 0 P(everything) = 1 P(A or B) = P(A) + P(B) P(A and B) The area of A candt get any bigger than 1 And an area of 1 would mean all worlds will have A true (C) Dhruv Batra Image Credit: Andrew Moore 18

Interpreting the Axioms 0<= P(A) <= 1 P(empty-set) = 0 P(everything) = 1 P(A or B) = P(A) + P(B) P(A and B) A P(A or B) B P(A and B) B Simple addition and subtraction (C) Dhruv Batra Image Credit: Andrew Moore 19

Concepts Sample Space Space of events Random Variables Mapping from events to numbers Discrete vs Continuous Probability Mass vs Density (C) Dhruv Batra 20

X X or Val(X) x 2X p(x = x) p(x) Discrete Random Variables 0 apple p(x) apple 1 for all x 2X discrete random variable sample space of possible outcomes, which may be finite or countably infinite outcome of sample of discrete random variable probability distribution (probability mass function) shorthand used when no ambiguity X x2x p(x) =1 (C) Dhruv Batra X = {1, 2, 3, 4} uniform distribution Slide Credit: Erik Suddherth degenerate distribution 21

Continuous Random Variables On board (C) Dhruv Batra 22

Concepts Expectation Variance (C) Dhruv Batra 23

Most Important Concepts Marginal distributions / Marginalization Conditional distribution / Chain Rule Bayes Rule (C) Dhruv Batra 24

Joint Distribution y z (C) Dhruv Batra 25

Marginalization Marginalization Events: P(A) = P(A and B) + P(A and not B) Random variables P(X = x) = P(X = x,y = y) y (C) Dhruv Batra 26

Marginal Distributions y z p(x, y) = X z2z p(x, y, z) p(x) = X y2y p(x, y) (C) Dhruv Batra Slide Credit: Erik Suddherth 27

Conditional Probabilities P(Y=y X=x) What do you believe about Y=y, if I tell you X=x? P(Donald Trump Wins the 2016 Election)? What if I tell you: He has the Republican nomination His twitter history The complete DVD set of The Apprentice (C) Dhruv Batra 28

Conditional Probabilities P(A B) = In worlds that where B is true, fraction where A is true Example H: Have a headache F: Coming down with Flu F P(H) = 1/10 P(F) = 1/40 P(H F) = 1/2 H AHeadaches are rare and flu is rarer, but if youire coming down with Jflu thereis a 50-50 chance youill have a headache.b (C) Dhruv Batra 29

Conditional Distributions p(x, y Z = z) = p(x, y, z) p(z) (C) Dhruv Batra Slide Credit: Erik Sudderth 30

Conditional Probabilities Definition Corollary: Chain Rule (C) Dhruv Batra 31

Independent Random Variables X? Y p(x, y) =p(x)p(y) for all x 2X,y 2Y (C) Dhruv Batra Slide Credit: Erik Sudderth 32

Marginal Independence Sets of variables X, Y X is independent of Y Shorthand: P Ⱶ (X Y) Proposition: P satisfies (X Y) if and only if P(X=x,Y=y) = P(X=x) P(Y=y), x Val X, y Val Y (C) Dhruv Batra 33

Conditional independence Sets of variables X, Y, Z X is independent of Y given Z if Shorthand: P Ⱶ (X Y Z) For P Ⱶ (X Y ), write P Ⱶ (X Y) Proposition: P satisfies (X Y Z) if and only if P(X,Y Z) = P(X Z) P(Y Z), x Val X, y Val Y, z Val(Z) (C) Dhruv Batra 34

Concept Bayes Rules Simple yet fundamental P(A ^ B) P(A B) P(B) P(B A) = ----------- = --------------- P(A) P(A) This is Bayes Rule Bayes, Thomas (1763) An essay towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:370-418 (C) Dhruv Batra Image Credit: Andrew Moore 35 20

Bayes Rule Simple yet profound Using Bayes Rules doesn t make your analysis Bayesian! Concepts: Likelihood Prior How much does a certain hypothesis explain the data? What do you believe before seeing any data? Posterior What do we believe after seeing the data? (C) Dhruv Batra 36

Entropy (C) Dhruv Batra Slide Credit: Sam Roweis 37

KL-Divergence / Relative Entropy (C) Dhruv Batra Slide Credit: Sam Roweis 38

KL-Divergence / Relative Entropy a (C) Dhruv Batra Image Credit: Wikipedia 39

KL-Divergence / Relative Entropy a (C) Dhruv Batra Image Credit: Wikipedia 40

End of Prob. Review Start of Estimation (C) Dhruv Batra 41