View source on GitHub |
Lower bound on mutual information from [Barber and Agakov (2003)][1].
tfp.vi.mutual_information.lower_bound_barber_agakov(
logu, entropy, name=None
)
This method gives a lower bound on the mutual information I(X; Y), by replacing the unknown conditional p(x|y) with a variational decoder q(x|y), but it requires knowing the entropy of X, h(X). The lower bound was introduced in [Barber and Agakov (2003)][1].
I(X; Y) = E_p(x, y)[log( p(x|y) / p(x) )]
= E_p(x, y)[log( q(x|y) / p(x) )] + E_p(y)[KL[ p(x|y) || q(x|y) ]]
>= E_p(x, y)[log( q(x|y) )] + h(X) = I_[lower_bound_barbar_agakov]
Example:
x
, y
are samples from a joint Gaussian distribution, with correlation
0.8
and both of dimension 1
.
batch_size, rho, dim = 10000, 0.8, 1
y, eps = tf.split(
value=tf.random.normal(shape=(2 * batch_size, dim), seed=7),
num_or_size_splits=2, axis=0)
mean, conditional_stddev = rho * y, tf.sqrt(1. - tf.square(rho))
x = mean + conditional_stddev * eps
# Conditional distribution of p(x|y)
conditional_dist = tfd.MultivariateNormalDiag(
mean, scale_diag=conditional_stddev * tf.ones((batch_size, dim)))
# Scores/unnormalized likelihood of pairs of joint samples `x[i], y[i]`
joint_scores = conditional_dist.log_prob(x)
# Differential entropy of `X` that is `1-D` Normal distributed.
entropy_x = 0.5 * np.log(2 * np.pi * np.e)
# Barber and Agakov lower bound on mutual information
lower_bound_barber_agakov(logu=joint_scores, entropy=entropy_x)
Returns | |
---|---|
lower_bound
|
float -like scalar for lower bound on mutual information.
|
References
[1]: David Barber, Felix V. Agakov. The IM algorithm: a variational approach to Information Maximization. In Conference on Neural Information Processing Systems, 2003.