View source on GitHub |
Class for representing a trainable relative quantile constraint.
Inherits From: NeuralConstraint
, BaseConstraint
tf_agents.bandits.policies.constraints.RelativeQuantileConstraint(
time_step_spec: tf_agents.typing.types.TimeStep
,
action_spec: tf_agents.typing.types.BoundedTensorSpec
,
constraint_network: tf_agents.typing.types.Network
,
quantile: float = 0.5,
comparator_fn: tf_agents.typing.types.ComparatorFn
= tf.greater,
baseline_action_fn: Optional[Callable[[types.Tensor], types.Tensor]] = None,
name: Text = 'RelativeQuantileConstraint'
)
This constraint class implements a relative quantile constraint such as
Q_tau(action) >= Q_tau(baseline_action)
or
Q_tau(action) <= Q_tau(baseline_action)
Attributes | |
---|---|
constraint_network
|
|
observation_spec
|
Methods
compute_loss
compute_loss(
observations: tf_agents.typing.types.NestedTensor
,
actions: tf_agents.typing.types.NestedTensor
,
rewards: tf_agents.typing.types.Tensor
,
weights: Optional[types.Float] = None,
training: bool = False
) -> tf_agents.typing.types.Tensor
Computes loss for training the constraint network.
Args | |
---|---|
observations
|
A batch of observations. |
actions
|
A batch of actions. |
rewards
|
A batch of rewards. |
weights
|
Optional scalar or elementwise (per-batch-entry) importance weights. The output batch loss will be scaled by these weights, and the final scalar loss is the mean of these values. |
training
|
Whether the loss is being used for training. |
Returns | |
---|---|
loss
|
A Tensor containing the loss for the training step.
|
initialize
initialize()
Returns an op to initialize the constraint.
__call__
__call__(
observation, actions=None
)
Returns the probability of input actions being feasible.