Abstract
Dynamic decisions are pivotal to economic policy making. We showhow existing evidence from randomized control trials can be utilised to guide personalized decisions in challenging dynamic environments with constraints such as limited budget or queues. Recent developments in reinforcement learning make it possible to solve many realistically complex settings for the first time. We allow for restricted policy functions and prove that their regret decays at rate n^0.5, the same as in the static case. We illustrate our methods with an application to job training. The approach scales to a wide range of important problems faced by policy makers.