tf_agents.bandits.environments.bandit_py_environment.BanditPyEnvironment

Base class for Bandit Python environments.

Inherits From: PyEnvironment

tf_agents.bandits.environments.bandit_py_environment.BanditPyEnvironment(
    observation_spec: tf_agents.typing.types.NestedArray,
    action_spec: tf_agents.typing.types.NestedArray,
    reward_spec: Optional[types.NestedArray] = None,
    name: Optional[Text] = None
)

Every bandit Python environment should derive from this class. Subclasses need to implement functions _observe() and _apply_action().

Usage:

To receive the first observation, the environment's reset() function should be called. To take an action, use the step(action) function. The time step returned by step(action) will contain the reward and the next observation.

Args
`handle_auto_reset`	When `True` the base class will handle auto_reset of the Environment.

Attributes
`batch_size`	The batch size of the environment.
`batched`	Whether the environment is batched or not. If the environment supports batched observations and actions, then overwrite this property to True. A batched environment takes in a batched set of actions and returns a batched set of observations. This means for all numpy arrays in the input and output nested structures, the first dimension is the batch size. When batched, the left-most dimension is not part of the action_spec or the observation_spec and corresponds to the batch dimension. When batched and handle_auto_reset, it checks `np.all(steps.is_last())`.
`name`

Attributes

batch_size The batch size of the environment.

batched

Whether the environment is batched or not.

If the environment supports batched observations and actions, then overwrite this property to True.

A batched environment takes in a batched set of actions and returns a batched set of observations. This means for all numpy arrays in the input and output nested structures, the first dimension is the batch size.

When batched, the left-most dimension is not part of the action_spec or the observation_spec and corresponds to the batch dimension.

When batched and handle_auto_reset, it checks np.all(steps.is_last()).

name

tf_agents.bandits.environments.bandit_py_environment.BanditPyEnvironment Stay organized with collections Save and categorize content based on your preferences.

Usage:

Args

Attributes

Methods

action_spec

close

current_time_step

discount_spec

get_info

get_state

observation_spec

render

reset

reward_spec

seed

set_state

should_reset

step

time_step_spec

__enter__

__exit__

tf_agents.bandits.environments.bandit_py_environment.BanditPyEnvironment

`action_spec`

`close`

`current_time_step`

`discount_spec`

`get_info`

`get_state`

`observation_spec`

`render`

`reset`

`reward_spec`

`seed`

`set_state`

`should_reset`

`step`

`time_step_spec`

`enter`

`exit`