[Compstats] AI and CompStats seminar: Nov 12 @ 11:30 (DC2306C)

Pascal Poupart ppoupart at cs.uwaterloo.ca
Wed Nov 3 18:22:49 EDT 2010


Hi,

Mohammad Ghavamzadeh will be visiting us next Friday (Nov 12).  He will 
give a talk on classification-based  techniques for reinforcement 
learning (see below).  He is an expert in reinforcement learning.  Let 
me know if you would like to meet him.

cheers,
Pascal

------------

Title: Analysis of Classification-based Policy Iteration Algorithms

Speaker: Mohammad Ghavamzadeh (INRIA, France)
Date: Friday Nov 12
Location: DC2306C (AI seminar room)
Time: 11:30 am

---

Abstract:

We present a variant of the classification-based approach to policy 
iteration which uses a cost-sensitive loss function weighing each 
classification mistake by its actual regret, i.e., the difference 
between the action-value of the greedy action and of the action chosen 
by the classifier. For this algorithm, we provide a full finite-sample 
analysis. Our results state a performance bound in terms of the number 
of policy improvement steps, the number of rollouts used in each 
iteration, the capacity of the considered policy space (classifier), and 
a capacity measure which indicates how well the policy space can 
approximate policies that are greedy w.r.t. any of its members. The 
analysis reveals a tradeoff between the estimation and approximation 
errors in this classification-based policy iteration setting. 
Furthermore, it confirms the intuition that classification-based policy 
iteration algorithms can be favorably compared to value function based 
approaches when the good policies are easier to be represented and 
learned than their corresponding value functions. We also study the 
consistency of the algorithm when there exists a sequence of policy 
spaces with increasing capacity.

---

Bio:

Mohammad Ghavamzadeh received a Ph.D. degree in Computer Science from 
the University of Massachusetts Amherst in 2005. He was a postdoctoral 
fellow at the Department of Computing Science at the University of 
Alberta from 2005 to 2008. Since 2008 he has been a researcher at INRIA 
Lille - Nord Europe, team SequeL. His research interests lie primarily 
in Artificial Intelligence and Machine Learning, with emphasis on 
decision making under uncertainty using principled mathematical tools 
from probability theory, decision theory, and statistics. His current 
research is mostly focused on using recent advances in statistical 
machine learning to develop more efficient reinforcement learning 
algorithms.

---


-- 
------------------------
Pascal Poupart
Associate Professor
David R. Cheriton School of Computer Science
University of Waterloo
200 University Avenue West
Waterloo, Ontario
Canada N2L 3G1
------------------------
Web: http://www.cs.uwaterloo.ca/~ppoupart
Email: ppoupart at cs.uwaterloo.ca 
Telephone: 1-519-888-4567x36239 
Fax: 1-519-885-1208
------------------------




-- 
------------------------
Pascal Poupart
Associate Professor
David R. Cheriton School of Computer Science
University of Waterloo
200 University Avenue West
Waterloo, Ontario
Canada N2L 3G1
------------------------
Web: http://www.cs.uwaterloo.ca/~ppoupart
Email: ppoupart at cs.uwaterloo.ca 
Telephone: 1-519-888-4567x36239 
Fax: 1-519-885-1208
------------------------



More information about the Compstats mailing list