tf_agents.bandits.policies.constraints.RelativeConstraint

Class for representing a trainable relative constraint.

Inherits From: NeuralConstraint, BaseConstraint

tf_agents.bandits.policies.constraints.RelativeConstraint(
    time_step_spec: tf_agents.typing.types.TimeStep,
    action_spec: tf_agents.typing.types.BoundedTensorSpec,
    constraint_network: tf_agents.typing.types.Network,
    error_loss_fn: tf_agents.typing.types.LossFn = tf.compat.v1.losses.mean_squared_error,
    comparator_fn: tf_agents.typing.types.ComparatorFn = tf.greater,
    margin: float = 0.0,
    baseline_action_fn: Optional[Callable[[types.NestedTensor], types.Tensor]] = None,
    name: Text = 'RelativeConstraint'
)

This constraint class implements a relative constraint such as

expected_value(action) >= (1 - margin) * expected_value(baseline_action)

expected_value(action) <= (1 - margin) * expected_value(baseline_action)

Args
`time_step_spec`	A `TimeStep` spec of the expected time_steps.
`action_spec`	A nest of `BoundedTensorSpec` representing the actions.
`constraint_network`	An instance of `tf_agents.network.Network` used to provide estimates of action feasibility. The input structure should be consistent with the `observation_spec`.
`error_loss_fn`	A function for computing the loss used to train the constraint network. The default is `tf.losses.mean_squared_error`.
`comparator_fn`	A comparator function, such as tf.greater or tf.less.
`margin`	A float in (0,1] that determines how strongly we want to enforce the constraint.
`baseline_action_fn`	a callable that given the observation returns the baseline action. If None, the baseline action is set to 0.
`name`	Python str name of this agent. All variables in this module will fall under that name. Defaults to the class name.

Attributes
`constraint_network`
`observation_spec`

Attributes

constraint_network

observation_spec

Methods

`compute_loss`

View source

compute_loss(
    observations: tf_agents.typing.types.NestedTensor,
    actions: tf_agents.typing.types.NestedTensor,
    rewards: tf_agents.typing.types.Tensor,
    weights: Optional[types.Float] = None,
    training: bool = False
) -> tf_agents.typing.types.Tensor

Computes loss for training the constraint network.

Args
`observations`	A batch of observations.
`actions`	A batch of actions.
`rewards`	A batch of rewards.
`weights`	Optional scalar or elementwise (per-batch-entry) importance weights. The output batch loss will be scaled by these weights, and the final scalar loss is the mean of these values.
`training`	Whether the loss is being used for training.

Returns
`loss`	A `Tensor` containing the loss for the training step.

`initialize`

View source

initialize()

Returns an op to initialize the constraint.

`call`

View source

__call__(
    observation, actions=None
)

Returns the probability of input actions being feasible.

tf_agents.bandits.policies.constraints.RelativeConstraint

Args

Attributes

Methods

compute_loss

initialize

__call__

`compute_loss`

`initialize`

`call`