\( \newcommand{\argmax}{\operatorname{arg\,max}\limits} \) \( \newcommand{\P}[1]{\mathbf{P} \left\{ #1\right\}} \) \( \newcommand{\E}{\mathbf{E}} \) \( \newcommand{\R}{\mathbb{R}} \) \( \newcommand{\set}[1]{\left\{#1\right\}} \) \( \newcommand{\floor}[1]{\left \lfloor {#1} \right\rfloor} \) \( \newcommand{\ceil}[1]{\left \lceil {#1} \right\rceil} \) \( \newcommand{\logp}{\log_{+}\!} \) \( \let\epsilon\varepsilon\)

Code

I wrote a lightweight and efficient C++ multi-armed bandit library. The focus right now is on the simplest setting with Gaussian/Bernoulli rewards. Currently implemented algorithms include UCB, Thompson sampling, MOSS, OCUCB, Exact Bayesian (Gaussian, two arms only), Gittins index (Gaussian), Conservative UCB and Unbalanced MOSS.

See http://github.com/tor/libbandit