\( \newcommand{\argmax}{\operatorname{arg\,max}\limits} \) \( \newcommand{\P}[1]{\mathbf{P} \left\{ #1\right\}} \) \( \newcommand{\E}{\mathbf{E}} \) \( \newcommand{\R}{\mathbb{R}} \) \( \newcommand{\set}[1]{\left\{#1\right\}} \) \( \newcommand{\floor}[1]{\left \lfloor {#1} \right\rfloor} \) \( \newcommand{\ceil}[1]{\left \lceil {#1} \right\rceil} \) \( \newcommand{\logp}{\log_{+}\!} \) \( \let\epsilon\varepsilon\)

About me

I’m a research scientist at DeepMind in London.

My work is mostly on the theory of sequential decision making, including multi-armed bandits and reinforcement learning.

You can contact me at firstname.lastname@gmail.com.


  • September 2017: A paper with Christoph Dann and Emma Brunskill was accepted for a spotlight at NIPS. There are a lot of things in this paper, but a highlight is a strengthening of the PAC-RL framework for episodic RL that allows one to prove PAC and regret guarantees simultaneously for the same algorithm. If you’re interested in these kinds of results, then the recent ICML paper by my colleagues at DeepMind explores very similar ideas. Finally, the aforementioned scale-free bandit paper was also accepted at NIPS.
  • August 2017: Laurent Orseau, Shane Legg and I recently re-visited the prediction with experts advice framework with linear mixture predictors and the log loss. The main observation is that the regret of the classical efficient algorithm “exponential gradients” can be very poor if the gradients get large. We show that these issues do not occur for the Prod algorithm in this setting, which besides has some tantalizing “pseudo-Bayesian” interpretations that we do not fully understand yet. The paper will appear at ALT this year.
  • April 2017: Happy to have started at DeepMind in London.
  • March 2017: I don’t know of any bandit algorithms for non-parametric noise models that don’t depend on advance knowledge of the scale of the noise (eg., subgaussian constant or variance or support). By generalising the ideas in Seb’s paper on bandits with heavy tails, it’s possible to derive an algorithm that only needs advance knowledge of a bound on the kurtosis of the noise, which roughly measure the propensity for outliers and is scale and shift invariant. You can download the preprint here. In other news, the AISTATS paper got an oral. Csaba will be there to give a great talk!
  • February 2017: The paper with Csaba on the asymptotic regret for linear bandits was accepted at AISTATS.
  • October 2016: Csaba and I have just released a new paper on finite-armed linear bandits to the arxiv. The main result is a characterisation of the optimal achievable asymptotic regret in the sense of Lai & Robbins, including an (extraordinarily naive) algorithm that attains the optimal rate. An interesting corollary of the analysis is that strategies based on optimism or Thompson sampling can be arbitrarily far from optimal, even in very simple cases.
  • September 2016: Csaba Szepesvari and I are writing a series of blog posts on bandit algorithms. The posts coincide with a course on bandits this semester at the UofA, and the same material will be used in a course at IU next semester. Eventually we plan to use the material (and the aid of the crowd) to write a book. The first post is up and you can expect about one new post per week.
  • September 2016: Goodbye UoA. Hello IU! I am in Bloomington now.
  • August 2016: I will be going to NIPS (four papers accepted).
  • June 2016: Finnian Lattimore (who happens to be my sister), Mark Reid and I have just released a new bandit paper to the arxiv. The idea is to show that causal information can sometimes be exploited when minimising the simple regret (pure exploration).
  • June 2016: The paper with Jan, Marcus, Laurent and I won best student paper at UAI. Congratulations Jan!
  • June 2016: Bandit strategies based on distinct phases of exploration followed by exploitation are pretty natural. Aurélien Garivier, Emilie Kaufmann and I show that such approaches are doomed to sub-optimality in the illustrative two-armed Gaussian bandit with known horizon.
  • May 2016: There are missing lower bounds for adversarial finite-armed bandits. Sébastien Gerchinovitz and I have now added some (preprint). Specifically: high probability bounds, first-order bounds (in terms of the loss of the best arm) and second-order bounds (in terms of the quadratic variation). There are also some impossibility results. For example, the presence of an arm that is optimal in every round cannot help, and neither can losses that lie in a small range. The latter results are in contrast to the full-information setting where these things lead to smaller regret.
  • May 2016: There is a new near-near-optimal algorithm for finite-armed subgaussian bandits. See the preprint or a brief description.
  • May 2016: Together with Jan Leike, Laurent Orseau and Marcus Hutter, I have a paper on asymptotic results for a version of Thompson sampling in general reinforcement learning environments accepted at UAI (http://arxiv.org/abs/1602.07905).
  • April 2016: I have papers accepted at COLT (regret analysis of the Gittins index, http://arxiv.org/abs/1511.06014) and ICML (on “conservative” bandits to guarantee revenue while exploring, http://arxiv.org/abs/1602.04282).