Creates a distribution for rewards for the mushroom environment.
tf_agents.bandits.environments.dataset_utilities.mushroom_reward_distribution(
r_noeat, r_eat_safe, r_eat_poison_bad, r_eat_poison_good, prob_poison_bad
)
Args |
r_noeat
|
(float) Reward value for not eating the mushroom.
|
r_eat_safe
|
(float) Reward value for eating an edible mushroom.
|
r_eat_poison_bad
|
(float) Reward value for eating and getting poisoned from
a poisonous mushroom.
|
r_eat_poison_good
|
(float) Reward value for surviving after eating a
poisonous mushroom.
|
prob_poison_bad
|
Probability of getting poisoned by a poisonous mushroom.
|
Returns |
A reward distribution table, instance of tfd.Distribution .
|