About
I am a research scientist at DeepMind in London, working mostly on algorithms for sequential decision making.For the last few years I've spent a lot of time working on bandits. Csaba and I recently completed a new book on bandits that is published by Cambridge University Press. You can download the online version of the book for free here. I also have a set of notes on bandit convex optimisation, which is available here. Please let me know if you find any mistakes or have suggestions for improvement.
You can contact me at firstname.lastname@gmail.com.
News
- January 2024: Research continues. Recently completed a set of notes on bandit convex optimisation: download.
- September 2022: Two years since an update. Best to check my scholar page for latest progress on partial monitoring, convex bandits, information-directed sampling and more.
- September 2020: The convex bandit paper was accepted at the journal Mathematical Statistics and Learning. Andras Gyorgy and I have been working on connections between the information-theoretic arguments used in that paper and the much-loved approach to adversarial bandits based on mirror descent or follow the regularised leader. Our results show that the stability of mirror descent for some exploration distribution and estimators is upper bounded by the information ratio. This reverses and generalises the arguments in this paper co-authored with Julian Zimmert showing that the information ratio is upper bounded by the stability in certain linear settings.
-
May 2020: Two partial monitoring papers accepted at COLT.
Also, here is preprint on bandit convex optimisation, which improves the dimension-dependence
in the information-theoretic rate from d^9.5 to
d^3d^(2.5). - March 2020: An update has been awhile coming and there's been lots of work I'm exciting about. With Johannes Kirschner and Andreas Krause, I worked on a linear version of partial monitoring. We prove a new classification theorem for finite actions sets, study the contextual case and exploit curvature for infinite action sets. All with the same algorithm and no tuning parameters. Congratulations to Botao Hao for an accepted paper at AISTATS studying the asymptotics of linear contextual bandits. I also finally returned to RL with colleagues Csaba and Gellert to study misspecified linear models in bandit and RL settings. There are lots of other works besides this, including a new algorithm for tree search, the bsuite RL proving ground, and model selection for contextual bandits.
- July 2019: Csaba and I have a followup on our COLT paper on partial monitoring. The main improvement is that now we have an algorithm that matches in the non-constructive upper bound in that paper. Unlike many previous algorithms for partial monitoring, the new algorithm is simple and efficient. The key idea is to use exponential weights, but rather than play the proposed distribution the algorithm solves a (convex) optimisation problem to find an exploration distribution for which the regret is not much larger and the estimation variance is well controlled. That such a distribution exists is the main difficulty and follows from a construction in our COLT paper and minimax duality. Preprint is available here.
- June 2019: Some people asked that I share the poster on information-theoretic methods for partial monitoring. You can download it at here (9MB). The poster was drawn on an iPad using GoodNotes for the handwriting and Graphic for the figures. The process was more time consuming than I hoped - you have been warned!
- April 2019: The partial monitoring/minimax/information theory paper was accepted at COLT. Looking forward to seeing you in Pheonix! The work with Shuai Li and Csaba on learning to rank in a linear model was accepted to ICML. Finally, how to make a bandit algorithm explore? Just add random 'garbage' to the history. The paper on how this works with Branislav Kveton, Csaba Szepesvari, Sharan Vaswani, Zheng Wen, Mohammad Ghavamzadeh and myself was also accepted at ICML.
- February 2019: Csaba and I are back at adversarial partial monitoring. This time using minimax duality and Bayesian regret analysis. The paper features a minimax theorem for finite-action partial monitoring and a generalisation of Russo and Van Roy's information-theoretic analysis for Bayesian regret. These are applied to a variety of problems, including finite-armed bandits, cops and robbers and finite-action partial monitoring. Preprint is here.