Reinforcement Learning in Continuous Time: Exploratory Formulation and Policy Evaluation


天數 202


2023-06-07 (星期三) 14:00 - 15:00


Reinforcement learning (RL) is an active subarea in machine learning, and has been developed predominantly for Markov decision processes (MDPs) in discrete time with discrete state and action spaces. In this talk, I will report the recent development of RL theory in continuous time and state space with possibly continuous action space, focusing on  exploratory formulation and policy evaluation. The theory employs stochastic relaxed control to obtain explicit optimal stochastic policies for exploration that are instrumental for function parameterization in learning, and derives  some martingale conditions that lead naturally to policy evaluation algorithms.