[Compstats] AI and CompStats seminar: Nov 12 @ 11:30 (DC2306C)

Pascal Poupart ppoupart at cs.uwaterloo.ca
Fri Nov 12 11:08:39 EST 2010


Quick reminder about Mohammad's talk today at 11:30 in DC2306C.
Pascal

Pascal Poupart wrote:
> Hi,
>
> Mohammad Ghavamzadeh will be visiting us next Friday (Nov 12).  He 
> will give a talk on classification-based  techniques for reinforcement 
> learning (see below).  He is an expert in reinforcement learning.  Let 
> me know if you would like to meet him.
>
> cheers,
> Pascal
>
> ------------
>
> Title: Analysis of Classification-based Policy Iteration Algorithms
>
> Speaker: Mohammad Ghavamzadeh (INRIA, France)
> Date: Friday Nov 12
> Location: DC2306C (AI seminar room)
> Time: 11:30 am
>
> ---
>
> Abstract:
>
> We present a variant of the classification-based approach to policy 
> iteration which uses a cost-sensitive loss function weighing each 
> classification mistake by its actual regret, i.e., the difference 
> between the action-value of the greedy action and of the action chosen 
> by the classifier. For this algorithm, we provide a full finite-sample 
> analysis. Our results state a performance bound in terms of the number 
> of policy improvement steps, the number of rollouts used in each 
> iteration, the capacity of the considered policy space (classifier), 
> and a capacity measure which indicates how well the policy space can 
> approximate policies that are greedy w.r.t. any of its members. The 
> analysis reveals a tradeoff between the estimation and approximation 
> errors in this classification-based policy iteration setting. 
> Furthermore, it confirms the intuition that classification-based 
> policy iteration algorithms can be favorably compared to value 
> function based approaches when the good policies are easier to be 
> represented and learned than their corresponding value functions. We 
> also study the consistency of the algorithm when there exists a 
> sequence of policy spaces with increasing capacity.
>
> ---
>
> Bio:
>
> Mohammad Ghavamzadeh received a Ph.D. degree in Computer Science from 
> the University of Massachusetts Amherst in 2005. He was a postdoctoral 
> fellow at the Department of Computing Science at the University of 
> Alberta from 2005 to 2008. Since 2008 he has been a researcher at 
> INRIA Lille - Nord Europe, team SequeL. His research interests lie 
> primarily in Artificial Intelligence and Machine Learning, with 
> emphasis on decision making under uncertainty using principled 
> mathematical tools from probability theory, decision theory, and 
> statistics. His current research is mostly focused on using recent 
> advances in statistical machine learning to develop more efficient 
> reinforcement learning algorithms.
>
> ---
>
>

-- 
------------------------
Pascal Poupart
Associate Professor
David R. Cheriton School of Computer Science
University of Waterloo
200 University Avenue West
Waterloo, Ontario
Canada N2L 3G1
------------------------
Web: http://www.cs.uwaterloo.ca/~ppoupart
Email: ppoupart at cs.uwaterloo.ca 
Telephone: 1-519-888-4567x36239 
Fax: 1-519-885-1208
------------------------



More information about the Compstats mailing list