AboutI am a research scientist at DeepMind in London, working mostly on algorithms for sequential decision making.
For the last few years I've spent a lot of time working on bandits. Csaba and I are in the process of completing a book on bandits to be published by Cambridge University Press. You can download the book for free here.
You can contact me at firstname.lastname@example.org.
- July 2019: Csaba and I have a followup on our COLT paper on partial monitoring. The main improvement is that now we have an algorithm that matches in the non-constructive upper bound in that paper. Unlike many previous algorithms for partial monitoring, the new algorithm is simple and efficient. The key idea is to use exponential weights, but rather than play the proposed distribution the algorithm solves a (convex) optimisation problem to find an exploration distribution for which the regret is not much larger and the estimation variance is well controlled. That such a distribution exists is the main difficulty and follows from a construction in our COLT paper and minimax duality. Preprint is available here.
- June 2019: Some people asked that I share the poster on information-theoretic methods for partial monitoring. You can download it at here (9MB). The poster was drawn on an iPad using GoodNotes for the handwriting and Graphic for the figures. The process was more time consuming than I hoped - you have been warned!
- April 2019: The partial monitoring/minimax/information theory paper was accepted at COLT. Looking forward to seeing you in Pheonix! The work with Shuai Li and Csaba on learning to rank in a linear model was accepted to ICML. Finally, how to make a bandit algorithm explore? Just add random 'garbage' to the history. The paper on how this works with Branislav Kveton, Csaba Szepesvari, Sharan Vaswani, Zheng Wen, Mohammad Ghavamzadeh and myself was also accepted at ICML.
- February 2019: Csaba and I are back at adversarial partial monitoring. This time using minimax duality and Bayesian regret analysis. The paper features a minimax theorem for finite-action partial monitoring and a generalisation of Russo and Van Roy's information-theoretic analysis for Bayesian regret. These are applied to a variety of problems, including finite-armed bandits, cops and robbers and finite-action partial monitoring. Preprint is here.