Agnostic KWIK learning and efficient approximate reinforcement learning

Similar documents
Computational Learning Theory: Agnostic Learning

ECE 5424: Introduction to Machine Learning

CS485/685 Lecture 5: Jan 19, 2016

Outline. Uninformed Search. Problem-solving by searching. Requirements for searching. Problem-solving by searching Uninformed search techniques

Lesson 07 Notes. Machine Learning. Quiz: Computational Learning Theory

1. Introduction Formal deductive logic Overview

6.041SC Probabilistic Systems Analysis and Applied Probability, Fall 2013 Transcript Lecture 21

TÜ Information Retrieval

ECE 5424: Introduction to Machine Learning

Kripke s skeptical paradox

Uncommon Priors Require Origin Disputes

Sampling Conditions for Conforming Voronoi Meshing by the VoroCrust Algorithm

What can happen if two quorums try to lock their nodes at the same time?

Grade 6 correlated to Illinois Learning Standards for Mathematics

McDougal Littell High School Math Program. correlated to. Oregon Mathematics Grade-Level Standards

NPTEL NPTEL ONINE CERTIFICATION COURSE. Introduction to Machine Learning. Lecture-59 Ensemble Methods- Bagging,Committee Machines and Stacking

NPTEL NPTEL ONLINE COURSES REINFORCEMENT LEARNING. UCB1 Explanation (UCB1)

Reasoning and Decision-Making under Uncertainty

Module 02 Lecture - 10 Inferential Statistics Single Sample Tests

Deep Neural Networks [GBC] Chap. 6, 7, 8. CS 486/686 University of Waterloo Lecture 18: June 28, 2017

INTRODUCTION TO HYPOTHESIS TESTING. Unit 4A - Statistical Inference Part 1

Torah Code Cluster Probabilities

RATIONALITY AND SELF-CONFIDENCE Frank Arntzenius, Rutgers University

Chapter 2: Commitment

Bounded Rationality :: Bounded Models

ECE 5984: Introduction to Machine Learning

Scientific Realism and Empiricism

MITOCW watch?v=4hrhg4euimo

NPTEL NPTEL ONLINE CERTIFICATION COURSE. Introduction to Machine Learning. Lecture 31

Module 5. Knowledge Representation and Logic (Propositional Logic) Version 2 CSE IIT, Kharagpur

Tuen Mun Ling Liang Church

POLS 205 Political Science as a Social Science. Making Inferences from Samples

THE PROFIT EFFICIENCY: EVIDENCE FROM ISLAMIC BANKS IN INDONESIA

Boosting. D. Blei Interacting with Data 1 / 15

The Development of Knowledge and Claims of Truth in the Autobiography In Code. When preparing her project to enter the Esat Young Scientist

Learning is a Risky Business. Wayne C. Myrvold Department of Philosophy The University of Western Ontario

correlated to the Massachussetts Learning Standards for Geometry C14

KNOWLEDGE AND THE PROBLEM OF LOGICAL OMNISCIENCE

Minimal and Maximal Models in Reinforcement Learning

Introduction to Statistical Hypothesis Testing Prof. Arun K Tangirala Department of Chemical Engineering Indian Institute of Technology, Madras

Content Area Variations of Academic Language

A New Parameter for Maintaining Consistency in an Agent's Knowledge Base Using Truth Maintenance System

Religious affiliation, religious milieu, and contraceptive use in Nigeria (extended abstract)

Some basic statistical tools. ABDBM Ron Shamir

Evidential Support and Instrumental Rationality

SPIRITUAL LIFE SURVEY REPORT. One Life Church. September 2011

Nigerian University Students Attitudes toward Pentecostalism: Pilot Study Report NPCRC Technical Report #N1102

Logic and Artificial Intelligence Lecture 26

ECE 5424: Introduction to Machine Learning

REVEAL Spiritual Vitality Index for Brazos Meadows Baptist Church

2017 Philosophy. Higher. Finalised Marking Instructions

Why the Hardest Logic Puzzle Ever Cannot Be Solved in Less than Three Questions

Six Sigma Prof. Dr. T. P. Bagchi Department of Management Indian Institute of Technology, Kharagpur

The Problem of Induction and Popper s Deductivism

Gesture recognition with Kinect. Joakim Larsson

End of the year test day 2 #3

Overview of the ATLAS Fast Tracker (FTK) (daughter of the very successful CDF SVT) July 24, 2008 M. Shochet 1

Falsification or Confirmation: From Logic to Psychology

PHILOSOPHY OF LOGIC AND LANGUAGE OVERVIEW LOGICAL CONSTANTS WEEK 5: MODEL-THEORETIC CONSEQUENCE JONNY MCINTOSH

Biometrics Prof. Phalguni Gupta Department of Computer Science and Engineering Indian Institute of Technology, Kanpur. Lecture No.

Same-different and A-not A tests with sensr. Same-Different and the Degree-of-Difference tests. Outline. Christine Borgen Linander

RELIGIOUS STUDIES. Christianity Beliefs and teachings and Practices. GCSE (9 1) Candidate Style Answers.

Our Story with MCM. Shanghai Jiao Tong University. March, 2014

Session 10 INDUCTIVE REASONONING IN THE SCIENCES & EVERYDAY LIFE( PART 1)

Knowledge, Time, and the Problem of Logical Omniscience

Ron Fagin Speaks Out on His Trajectory as a Database Theoretician

The Pigeonhole Principle

Christians Say They Do Best At Relationships, Worst In Bible Knowledge

Chance, Chaos and the Principle of Sufficient Reason

FUZZY EXPERT SYSTEM IN DETERMINING HADITH 1 VALIDITY. 1. Introduction

Chapter 20 Testing Hypotheses for Proportions

Lesson 09 Notes. Machine Learning. Intro

Surveying Prof. Bharat Lohani Department of Civil Engineering Indian Institute of Technology, Kanpur. Module - 7 Lecture - 3 Levelling and Contouring

Probability Distributions TEACHER NOTES MATH NSPIRED

I thought I should expand this population approach somewhat: P t = P0e is the equation which describes population growth.

Robert Nozick s seminal 1969 essay ( Newcomb s Problem and Two Principles

Computing Machinery and Intelligence. The Imitation Game. Criticisms of the Game. The Imitation Game. Machines Concerned in the Game

Reply to Hawthorne. Philosophy and Phenomenological Research Vol. LXIV, No. 1, January 2002

Bayesian Probability

This report is organized in four sections. The first section discusses the sample design. The next

Six Sigma Prof. Dr. T. P. Bagchi Department of Management Indian Institute of Technology, Kharagpur. Lecture No. # 18 Acceptance Sampling

Sentiment Flow! A General Model of Web Review Argumentation

Marcello Pagano [JOTTER WEEK 5 SAMPLING DISTRIBUTIONS ] Central Limit Theorem, Confidence Intervals and Hypothesis Testing

Prof. Bryan Caplan Econ 812

The Problem with Complete States: Freedom, Chance and the Luck Argument

Can a Machine Think? Christopher Evans (1979) Intro to Philosophy Professor Douglas Olena

Introduction to Inference

Northfield Methodist Church

2.1 Review. 2.2 Inference and justifications

How many imputations do you need? A two stage calculation using a quadratic rule

Balancing Authority Ace Limit (BAAL) Proof-of-Concept BAAL Field Trial

Church Planter Summary Report for Shane Planter

Artificial Intelligence. Clause Form and The Resolution Rule. Prof. Deepak Khemani. Department of Computer Science and Engineering

The Evolution of Belief Ambiguity During the Process of High School Choice

Religious Studies B GCSE (9 1)

THE CONCEPT OF OWNERSHIP by Lars Bergström

Sorting: Merge Sort. College of Computing & Information Technology King Abdulaziz University. CPCS-204 Data Structures I

A Linear Programming Approach to Complex Games: An Application to Nuclear Exchange Models

Load balanced Scalable Byzantine Agreement through Quorum Building, with Full Information

Qualitative and quantitative inference to the best theory. reply to iikka Niiniluoto Kuipers, Theodorus

Transcription:

Agnostic KWIK learning and efficient approximate reinforcement learning István Szita Csaba Szepesvári Department of Computing Science University of Alberta Annual Conference on Learning Theory, 2011 Szityu & Szepi (UofA) Agnostic KWIK learning COLT 11 1 / 23

Outline 1 Basic concepts Efficient reinforcement learning The Knows what it knows (KWIK) framework 2 Agnostic KWIK learning Definitions Results for several problem classes 3 Summary Szityu & Szepi (UofA) Agnostic KWIK learning COLT 11 2 / 23

Reinforcement learning Maximize long-lerm reward but environment is unknown agent needs to explore, but exploration is costly Szityu & Szepi (UofA) Agnostic KWIK learning COLT 11 3 / 23

Efficient RL algorithms make bounded amount of non-optimal steps 1 balance exploration and exploitation exist for many environment classes (e.g. MDPs) 1 alternative definitions exist Szityu & Szepi (UofA) Agnostic KWIK learning COLT 11 4 / 23

The Rmax-construction : A general scheme for efficient RL keep track of known areas KWIK learner assume that unknown areas have maximum reward plan optimal path within the known area collect new experience when leaving known area Szityu & Szepi (UofA) Agnostic KWIK learning COLT 11 5 / 23

The Knows what it knows (KWIK) framework [Li, Walsh, Littman, 2008] Adversary picks a concept repeat: Adversary picks query x if Learner passes, Adversary gives noisy feedback Learner updates itself if Learner predicts, it has to be accurate otherwise it fails Szityu & Szepi (UofA) Agnostic KWIK learning COLT 11 6 / 23

The Rmax construction with a KWIK learner KWIK-Rmax(MDPLearner, Planner) MDPLearner.initialize(...) Planner.initialize(...) Observe s 1 for t := 1, 2,... do a t = Planner.plan(Opt(MDPLearner), s t ) Execute a t and observe s t+1, r t if MDPLearner.predict(s t, a t ) = then MDPLearner.learn((s t, a t ), (δ st+1, r t )) {Optimistic Wrapper} Opt(MDPLearner).predict(s, a) if MDPLearner.predict(s, a) = then return (δ s ( ), (1 γ)v max ) else return MDPLearner.predict(s, a) Szityu & Szepi (UofA) Agnostic KWIK learning COLT 11 7 / 23

The KWIK-Rmax theorem [Li, Walsh, Littman, 2008] Let G be a class of environment models. (e.g. the class of MDPs, factored MDPs, linear MDPs). If we have An efficient KWIK-learner for class G A near-optimal planner for models in G then the KWIK-Rmax algorithm constructed from these is an efficient reinforcement learner on G. but what if the environment is not contained in the class G? Szityu & Szepi (UofA) Agnostic KWIK learning COLT 11 8 / 23

The need for agnostic learning In reinforcement learning, we often need to environment is almost a factored MDP, but modeled as an FMDP state abstraction (e.g., aggregation) is used, but MDP is uncompressible function approximation is used In such cases, we should not assume that we know the class G of the environment. We should be agnostic! Agnostic = no knowledge of where the adversary chooses its concept from Szityu & Szepi (UofA) Agnostic KWIK learning COLT 11 9 / 23

Agnostic KWIK learning agent does not know the problem class G it chooses from another class H we assume that an upper bound on their distance is known: D (G, H) def = sup (X,Y,g,Z) G inf h g. h H D Szityu & Szepi (UofA) Agnostic KWIK learning COLT 11 10 / 23

Agnostic KWIK learning: prediction accuracy we cannot guarantee ɛ accuracy (of course) interestingly, we cannot guarantee D + ɛ we require r D + ɛ r 1 is the competitiveness factor Szityu & Szepi (UofA) Agnostic KWIK learning COLT 11 11 / 23

Problems and problem classes Definition (Problem) A problem is a 5-tuple G = (X, Y, g, Z, ), where X is the set of inputs, Y R d is a measurable set of possible responses, Z : X P(Y) is the noise distribution (zero-mean) : R d R + is a semi-norm on R d. Definition (Problem class) A problem class G is a set of problems. Szityu & Szepi (UofA) Agnostic KWIK learning COLT 11 12 / 23

Agnostic KWIK learner D > 0: approximation error bound r 1: competitiveness factor ɛ 0: accuracy slack δ 0: confidence parameter A learning agent is agnostic KWIK for (ɛ, δ, r, D) if outside of an event of probability at most δ, it holds that when it predicts, error is r D + ɛ # of passes is bounded Complexity: # of passes = f (ɛ, δ, D, r) Szityu & Szepi (UofA) Agnostic KWIK learning COLT 11 13 / 23

Agnostic KWIK-Rmax theorem Fix ɛ > 0, r 1, 0 < δ 1/2. If we have an (rd + ɛ)-accurate agnostic KWIK learner, with complexity bound B(δ), and a e planner -accurate planner, then with prob. 1 2δ, the KWIK-Rmax algorithm makes ( Vmax (1 γ)l { ( O B(δ) + log L )} ) rd + ɛ δ mistakes larger than 5(rD + ɛ) 1 γ + e planner, where is the rd + ɛ-horizon time. L = O((1 γ) 1 log(v max (1 γ)/(rd + ɛ))) Szityu & Szepi (UofA) Agnostic KWIK learning COLT 11 14 / 23

The agnostic KWIK-Rmax theorem justifies the agnostic KWIK framework!.. but what can we agnostic KWIK learn? Szityu & Szepi (UofA) Agnostic KWIK learning COLT 11 15 / 23

Finite hypothesis class H, deterministic case Learner is given D and the hypotheses f 1,..., f H ; does not know the true concept g for each query x, see if there is a prediction y such that y f i (x) D for all i if yes, then y is a good prediction! (2D-accurate) if not, then we have to pass and receive g(x) y fi (x) > D for at least one f i so we can exclude it Szityu & Szepi (UofA) Agnostic KWIK learning COLT 11 16 / 23

Finite hypothesis class H, deterministic case The previous algorithm passes at most H 1 times (for each i don t know, it excludes at least one hypothesis) gives 2D-accurate predictions (r = 2, ɛ = 0) Szityu & Szepi (UofA) Agnostic KWIK learning COLT 11 17 / 23

A sample run of the agnostic KWIK learner x?? Szityu & Szepi (UofA) Agnostic KWIK learning COLT 11 18 / 23

Finite hypothesis class H, noisy problems solution is not trivial: We cannot exclude a hypothesis by a single sample. We need to take averages. If (y t f (x t )) is small, f may be still bad (adversary selects over- and underestimating places alternately) If (y t f (x t )) is large, f is definitely bad but the adversary can prevent us from seeing such a case (for every 1000 small-error x t it gives one large-error one) Szityu & Szepi (UofA) Agnostic KWIK learning COLT 11 19 / 23

Finite hypothesis class H, noisy problems if f 1 < f 2 + 2D on some region, then sample average in that region is much closer to one of them. The other one can be excluded. f 1 f 2 x f 1 f 2 f 1 f 2 Szityu & Szepi (UofA) Agnostic KWIK learning COLT 11 20 / 23

Finite hypothesis class H, noisy problems Algorithm: keep a bag of samples for each f i, f j for each query x, see if there is a prediction y such that y f i (x) < D + ɛ/2 for all i if yes, then y is a good prediction! (2D + ɛ-accurate) if not, then we have to pass and receive y = g(x)+noise f i (x) f j (x) for at least one f i, f j add (x, y ) to the corresponding bag if m samples gathered in a bag, calculate sample average one hypothesis can be excluded Szityu & Szepi (UofA) Agnostic KWIK learning COLT 11 21 / 23

Table of learning complexities Hypothesis class Approx. Agnostic KWIK KWIK Finite, noisy 2D + ɛ O de- d-dim linear, terministic Finite, deterministic 2D N 1 N 1 2D + ɛ O 2D + ɛ Ω(2 d ) 2D d-dim linear, noisy 2D + ɛ O( 1 ɛ 2d+2 log 1 δɛ d ) ( ) N 2 log N ɛ 2 δ O ( N log N ) ɛ 2 δ ( d! ( ) 1 ɛ + 1) d d + 1 ( ) O d 3 log 1 ɛ 4 δɛ Szityu & Szepi (UofA) Agnostic KWIK learning COLT 11 22 / 23

Summary Agnostic KWIK learning... is a new online learning framework can be applied to efficient reinforcement learning with non-exact models is generally much harder than ordinary KWIK proofs and exampes in the paper Open problems: agnostic KWIK learner for transition probabilities (essential for agnostic learning of MDPs) How to do agnostic RL more efficiently, without agnostic KWIK (agnostic KWIK is too restrictive) Szityu & Szepi (UofA) Agnostic KWIK learning COLT 11 23 / 23