View source on GitHub |
Returned with every call to step
and reset
on an environment.
tf_agents.trajectories.TimeStep(
step_type, reward, discount, observation
)
A TimeStep
contains the data emitted by an environment at each step of
interaction. A TimeStep
holds a step_type
, an observation
(typically a
NumPy array or a dict or list of arrays), and an associated reward
and
discount
.
The first TimeStep
in a sequence will equal StepType.FIRST
. The final
TimeStep
will equal StepType.LAST
. All other TimeStep
s in a sequence
will equal `StepType.MID.
Methods
is_first
is_first() -> tf_agents.typing.types.Bool
is_last
is_last() -> tf_agents.typing.types.Bool
is_mid
is_mid() -> tf_agents.typing.types.Bool