I’m a research scientist at DeepMind in London.
You can contact me at email@example.com.
- September 2018: After a long time my work on bandit algorithms that are simultaneously minimax optimal, asymptotically optimal and never worse than UCB is published in JMLR. Thanks to the many reviewers that helped with this project! Although it does not exactly deprecate the two previous tech reports, this version is much more polished.
- August 2018: Here are the slides from my talk on bandits and exploration at the RL summer school. Huge thanks to all the organisers of this event.
- July 2018: Csaba and I have released the first draft of the bandit book. You can download the pdf here. Of course we would love your feedback!
- June 2018: Branislav, Shuai, Csaba and I have a new paper on stochastic ranking. The setup generalizes many previous stochastic models and the guarantees are both simpler to prove and stronger than comparably generic results.
- May 2018: While finishing the last few chapters of the book Csaba and I have been working on simplifying and improving the algorithms for adversarial partial monitoring. We prove a variety of new results, the most notable of which is a new algorithm and analysis that completes the classification of all finite adversarial partial monitoring games. Missing from the literature was the issue of dealing with so-called easy ‘degenerate’ games that include some actions that can never be uniquely optimal, but may provide crucial information. Besides this we make some headway towards understanding the dependence of the regret on quantities beyond the horizon. The preprint is now available on arxiv.
- February 2018: Csaba and I gave a tutorial on bandits at AAAI. Here are the slides (part1, part2).
- December 2017: Together with Joel Veness, Avishkar Bhoopchand, Agnieszka Grabska-Barwinska, Christopher Mattern and Peter Toth we have released a preliminary paper on a architecture based on a network of geometric mixers, all trained on the target. The long paper includes preliminary empirical results and theory, but there is a lot of potential to investigate different aspects of this model and various extensions. We hope you try it out. Also, I am enjoying the sun at NIPS.
- September 2017: A paper with Christoph Dann and Emma Brunskill was accepted for a spotlight at NIPS. There are a lot of things in this paper, but a highlight is a strengthening of the PAC-RL framework for episodic RL that allows one to prove PAC and regret guarantees simultaneously for the same algorithm. If you’re interested in these kinds of results, then the recent ICML paper by my colleagues at DeepMind explores very similar ideas. Finally, the aforementioned scale-free bandit paper was also accepted at NIPS.
- August 2017: Laurent Orseau, Shane Legg and I recently re-visited the prediction with experts advice framework with linear mixture predictors and the log loss. The main observation is that the regret of the classical efficient algorithm “exponential gradients” can be very poor if the gradients get large. We show that these issues do not occur for the Prod algorithm in this setting, which besides has some tantalizing “pseudo-Bayesian” interpretations that we do not fully understand yet. The paper will appear at ALT this year.
- April 2017: Happy to have started at DeepMind in London.
- March 2017: I don’t know of any bandit algorithms for non-parametric noise models that don’t depend on advance knowledge of the scale of the noise (eg., subgaussian constant or variance or support). By generalising the ideas in Seb’s paper on bandits with heavy tails, it’s possible to derive an algorithm that only needs advance knowledge of a bound on the kurtosis of the noise, which roughly measure the propensity for outliers and is scale and shift invariant. You can download the preprint here. In other news, the AISTATS paper got an oral. Csaba will be there to give a great talk!
- February 2017: The paper with Csaba on the asymptotic regret for linear bandits was accepted at AISTATS.
- October 2016: Csaba and I have just released a new paper on finite-armed linear bandits to the arxiv. The main result is a characterisation of the optimal achievable asymptotic regret in the sense of Lai & Robbins, including an (extraordinarily naive) algorithm that attains the optimal rate. An interesting corollary of the analysis is that strategies based on optimism or Thompson sampling can be arbitrarily far from optimal, even in very simple cases.
- September 2016: Csaba Szepesvari and I are writing a series of blog posts on bandit algorithms. The posts coincide with a course on bandits this semester at the UofA, and the same material will be used in a course at IU next semester. Eventually we plan to use the material (and the aid of the crowd) to write a book. The first post is up and you can expect about one new post per week.
- September 2016: Goodbye UoA. Hello IU! I am in Bloomington now.
- August 2016: I will be going to NIPS (four papers accepted).
- June 2016: Finnian Lattimore (who happens to be my sister), Mark Reid and I have just released a new bandit paper to the arxiv. The idea is to show that causal information can sometimes be exploited when minimising the simple regret (pure exploration).
- June 2016: The paper with Jan, Marcus, Laurent and I won best student paper at UAI. Congratulations Jan!
- June 2016: Bandit strategies based on distinct phases of exploration followed by exploitation are pretty natural. Aurélien Garivier, Emilie Kaufmann and I show that such approaches are doomed to sub-optimality in the illustrative two-armed Gaussian bandit with known horizon.
- May 2016: There are missing lower bounds for adversarial finite-armed bandits. Sébastien Gerchinovitz and I have now added some (preprint). Specifically: high probability bounds, first-order bounds (in terms of the loss of the best arm) and second-order bounds (in terms of the quadratic variation). There are also some impossibility results. For example, the presence of an arm that is optimal in every round cannot help, and neither can losses that lie in a small range. The latter results are in contrast to the full-information setting where these things lead to smaller regret.
- May 2016: There is a new near-near-optimal algorithm for finite-armed subgaussian bandits. See the preprint or a brief description.
- May 2016: Together with Jan Leike, Laurent Orseau and Marcus Hutter, I have a paper on asymptotic results for a version of Thompson sampling in general reinforcement learning environments accepted at UAI (http://arxiv.org/abs/1602.07905).
- April 2016: I have papers accepted at COLT (regret analysis of the Gittins index, http://arxiv.org/abs/1511.06014) and ICML (on “conservative” bandits to guarantee revenue while exploring, http://arxiv.org/abs/1602.04282).