View source on GitHub |
Batch together environments and simulate them in external processes.
Inherits From: PyEnvironment
tf_agents.environments.ParallelPyEnvironment(
env_constructors: Sequence[tf_agents.environments.parallel_py_environment.EnvConstructor
],
start_serially: bool = True,
blocking: bool = False,
flatten: bool = False
)
The environments are created in external processes by calling the provided callables. This can be an environment class, or a function creating the environment and potentially wrapping it. The returned environment should not access global variables.
Raises | |
---|---|
ValueError
|
If the action or observation specs don't match. |
Methods
action_spec
action_spec() -> tf_agents.typing.types.NestedArraySpec
Defines the actions that should be provided to step()
.
May use a subclass of ArraySpec
that specifies additional properties such
as min and max bounds on the values.
Returns | |
---|---|
An ArraySpec , or a nested dict, list or tuple of ArraySpec s.
|
close
close() -> None
Close all external process.
current_time_step
current_time_step() -> tf_agents.trajectories.TimeStep
Returns the current timestep.
discount_spec
discount_spec() -> tf_agents.typing.types.NestedArraySpec
Defines the discount that are returned by step()
.
Override this method to define an environment that uses non-standard discount values, for example an environment with array-valued discounts.
Returns | |
---|---|
An ArraySpec , or a nested dict, list or tuple of ArraySpec s.
|
get_info
get_info() -> tf_agents.typing.types.NestedArray
Returns the environment info returned on the last step.
Returns | |
---|---|
Info returned by last call to step(). None by default. |
Raises | |
---|---|
NotImplementedError
|
If the environment does not use info. |
get_state
get_state() -> Any
Returns the state
of the environment.
The state
contains everything required to restore the environment to the
current configuration. This can contain e.g.
- The current time_step.
- The number of steps taken in the environment (for finite horizon MDPs).
- Hidden state (for POMDPs).
Callers should not assume anything about the contents or format of the
returned state
. It should be treated as a token that can be passed back to
set_state()
later.
Note that the returned state
handle should not be modified by the
environment later on, and ensuring this (e.g. using copy.deepcopy) is the
responsibility of the environment.
Returns | |
---|---|
state
|
The current state of the environment. |
observation_spec
observation_spec() -> tf_agents.typing.types.NestedArraySpec
Defines the observations provided by the environment.
May use a subclass of ArraySpec
that specifies additional properties such
as min and max bounds on the values.
Returns | |
---|---|
An ArraySpec , or a nested dict, list or tuple of ArraySpec s.
|
render
render(
mode: Text = 'rgb_array'
) -> tf_agents.typing.types.NestedArray
Renders the environment.
Args | |
---|---|
mode
|
Rendering mode. Currently only 'rgb_array' is supported because this is a batched environment. |
Returns | |
---|---|
An ndarray of shape [batch_size, width, height, 3] denoting RGB images
(for mode=rgb_array ).
|
Raises | |
---|---|
NotImplementedError
|
If the environment does not support rendering,
or any other mode than rgb_array is given.
|
reset
reset() -> tf_agents.trajectories.TimeStep
Starts a new sequence and returns the first TimeStep
of this sequence.
Returns | |
---|---|
A TimeStep namedtuple containing:
step_type: A StepType of FIRST .
reward: 0.0, indicating the reward.
discount: 1.0, indicating the discount.
observation: A NumPy array, or a nested dict, list or tuple of arrays
corresponding to observation_spec() .
|
reward_spec
reward_spec() -> tf_agents.typing.types.NestedArraySpec
Defines the rewards that are returned by step()
.
Override this method to define an environment that uses non-standard reward values, for example an environment with array-valued rewards.
Returns | |
---|---|
An ArraySpec , or a nested dict, list or tuple of ArraySpec s.
|
seed
seed(
seeds: Sequence[tf_agents.typing.types.Seed
]
) -> Sequence[Any]
Seeds the parallel environments.
set_state
set_state(
state: Any
) -> None
Restores the environment to a given state
.
See definition of state
in the documentation for get_state().
Args | |
---|---|
state
|
A state to restore the environment to. |
should_reset
should_reset(
current_time_step: tf_agents.trajectories.TimeStep
) -> bool
Whether the Environmet should reset given the current timestep.
By default it only resets when all time_steps are LAST
.
Args | |
---|---|
current_time_step
|
The current TimeStep .
|
Returns | |
---|---|
A bool indicating whether the Environment should reset or not. |
start
start() -> None
step
step(
action: tf_agents.typing.types.NestedArray
) -> tf_agents.trajectories.TimeStep
Updates the environment according to the action and returns a TimeStep
.
If the environment returned a TimeStep
with StepType.LAST
at the
previous step the implementation of _step
in the environment should call
reset
to start a new sequence and ignore action
.
This method will start a new sequence if called after the environment
has been constructed and reset
has not been called. In this case
action
will be ignored.
If should_reset(current_time_step)
is True, then this method will reset
by itself. In this case action
will be ignored.
Args | |
---|---|
action
|
A NumPy array, or a nested dict, list or tuple of arrays
corresponding to action_spec() .
|
Returns | |
---|---|
A TimeStep namedtuple containing:
step_type: A StepType value.
reward: A NumPy array, reward value for this timestep.
discount: A NumPy array, discount in the range [0, 1].
observation: A NumPy array, or a nested dict, list or tuple of arrays
corresponding to observation_spec() .
|
time_step_spec
time_step_spec() -> tf_agents.trajectories.TimeStep
Describes the TimeStep
fields returned by step()
.
Override this method to define an environment that uses non-standard values
for any of the items returned by step()
. For example, an environment with
array-valued rewards.
Returns | |
---|---|
A TimeStep namedtuple containing (possibly nested) ArraySpec s defining
the step_type, reward, discount, and observation structure.
|
__enter__
__enter__()
Allows the environment to be used in a with-statement context.
__exit__
__exit__(
unused_exception_type, unused_exc_value, unused_traceback
)
Allows the environment to be used in a with-statement context.