View source on GitHub |
Class for representing a trainable relative constraint.
Inherits From: NeuralConstraint
, BaseConstraint
tf_agents.bandits.policies.constraints.RelativeConstraint(
time_step_spec: tf_agents.typing.types.TimeStep
,
action_spec: tf_agents.typing.types.BoundedTensorSpec
,
constraint_network: tf_agents.typing.types.Network
,
error_loss_fn: tf_agents.typing.types.LossFn
= tf.compat.v1.losses.mean_squared_error,
comparator_fn: tf_agents.typing.types.ComparatorFn
= tf.greater,
margin: float = 0.0,
baseline_action_fn: Optional[Callable[[types.NestedTensor], types.Tensor]] = None,
name: Text = 'RelativeConstraint'
)
This constraint class implements a relative constraint such as
expected_value(action) >= (1 - margin) * expected_value(baseline_action)
or
expected_value(action) <= (1 - margin) * expected_value(baseline_action)
Attributes | |
---|---|
constraint_network
|
|
observation_spec
|
Methods
compute_loss
compute_loss(
observations: tf_agents.typing.types.NestedTensor
,
actions: tf_agents.typing.types.NestedTensor
,
rewards: tf_agents.typing.types.Tensor
,
weights: Optional[types.Float] = None,
training: bool = False
) -> tf_agents.typing.types.Tensor
Computes loss for training the constraint network.
Args | |
---|---|
observations
|
A batch of observations. |
actions
|
A batch of actions. |
rewards
|
A batch of rewards. |
weights
|
Optional scalar or elementwise (per-batch-entry) importance weights. The output batch loss will be scaled by these weights, and the final scalar loss is the mean of these values. |
training
|
Whether the loss is being used for training. |
Returns | |
---|---|
loss
|
A Tensor containing the loss for the training step.
|
initialize
initialize()
Returns an op to initialize the constraint.
__call__
__call__(
observation, actions=None
)
Returns the probability of input actions being feasible.