View source on GitHub |
Implements the Neural + LinUCB bandit algorithm.
Applies LinUCB on top of an encoding network. Since LinUCB is a linear method, the encoding network is used to capture the non-linear relationship between the context features and the expected rewards. The encoding network may be already trained or not; if not trained, the method can optionally train it using epsilon greedy.
Reference:
Carlos Riquelme, George Tucker, Jasper Snoek,
Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep
Networks for Thompson Sampling
, ICLR 2018.
Classes
class NeuralLinUCBAgent
: An agent implementing the LinUCB algorithm on top of a neural network.
class NeuralLinUCBVariableCollection
: A collection of variables used by NeuralLinUCBAgent
.
Other Members | |
---|---|
absolute_import |
Instance of __future__._Feature
|
division |
Instance of __future__._Feature
|
print_function |
Instance of __future__._Feature
|