Module: tf.nn

Primitive Neural Net (NN) Operations.

Notes on padding

Several neural network operations, such as tf.nn.conv2d and tf.nn.max_pool2d, take a padding parameter, which controls how the input is padded before running the operation. The input is padded by inserting values (typically zeros) before and after the tensor in each spatial dimension. The padding parameter can either be the string 'VALID', which means use no padding, or 'SAME' which adds padding according to a formula which is described below. Certain ops also allow the amount of padding per dimension to be explicitly specified by passing a list to padding.

In the case of convolutions, the input is padded with zeros. In case of pools, the padded input values are ignored. For example, in a max pool, the sliding window ignores padded values, which is equivalent to the padded values being -infinity.

'VALID' padding

Passing padding='VALID' to an op causes no padding to be used. This causes the output size to typically be smaller than the input size, even when the stride is one. In the 2D case, the output size is computed as:

out_height = ceil((in_height - filter_height + 1) / stride_height)
out_width  = ceil((in_width - filter_width + 1) / stride_width)

The 1D and 3D cases are similar. Note filter_height and filter_width refer to the filter size after dilations (if any) for convolutions, and refer to the window size for pools.

'SAME' padding

With 'SAME' padding, padding is applied to each spatial dimension. When the strides are 1, the input is padded such that the output size is the same as the input size. In the 2D case, the output size is computed as:

out_height = ceil(in_height / stride_height)
out_width  = ceil(in_width / stride_width)

The amount of padding used is the smallest amount that results in the output size. The formula for the total amount of padding per dimension is:

if (in_height % strides[1] == 0):
  pad_along_height = max(filter_height - stride_height, 0)
else:
  pad_along_height = max(filter_height - (in_height % stride_height), 0)
if (in_width % strides[2] == 0):
  pad_along_width = max(filter_width - stride_width, 0)
else:
  pad_along_width = max(filter_width - (in_width % stride_width), 0)

Finally, the padding on the top, bottom, left and right are:

pad_top = pad_along_height // 2
pad_bottom = pad_along_height - pad_top
pad_left = pad_along_width // 2
pad_right = pad_along_width - pad_left

Note that the division by 2 means that there might be cases when the padding on both sides (top vs bottom, right vs left) are off by one. In this case, the bottom and right sides always get the one additional padded pixel. For example, when pad_along_height is 5, we pad 2 pixels at the top and 3 pixels at the bottom. Note that this is different from existing libraries such as PyTorch and Caffe, which explicitly specify the number of padded pixels and always pad the same number of pixels on both sides.

Here is an example of 'SAME' padding:

in_height = 5
filter_height = 3
stride_height = 2

in_width = 2
filter_width = 2
stride_width = 1

inp = tf.ones((2, in_height, in_width, 2))
filter = tf.ones((filter_height, filter_width, 2, 2))
strides = [stride_height, stride_width]
output = tf.nn.conv2d(inp, filter, strides, padding='SAME')
output.shape[1]  # output_height: ceil(5 / 2)
3
output.shape[2] # output_width: ceil(2 / 1)
2

Explicit padding

Certain ops, like tf.nn.conv2d, also allow a list of explicit padding amounts to be passed to the padding parameter. This list is in the same format as what is passed to tf.pad, except the padding must be a nested list, not a tensor. For example, in the 2D case, the list is in the format [[0, 0], [pad_top, pad_bottom], [pad_left, pad_right], [0, 0]] when data_format is its default value of 'NHWC'. The two [0, 0] pairs indicate the batch and channel dimensions have no padding, which is required, as only spatial dimensions can have padding.

For example:

inp = tf.ones((1, 3, 3, 1))
filter = tf.ones((2, 2, 1, 1))
strides = [1, 1]
padding = [[0, 0], [1, 2], [0, 1], [0, 0]]
output = tf.nn.conv2d(inp, filter, strides, padding=padding)
tuple(output.shape)
(1, 5, 3, 1)
# Equivalently, tf.pad can be used, since convolutions pad with zeros.
inp = tf.pad(inp, padding)
# 'VALID' means to use no padding in conv2d (we already padded inp)
output2 = tf.nn.conv2d(inp, filter, strides, padding='VALID')
tf.debugging.assert_equal(output, output2)

Difference between convolution and pooling layers

How padding is used in convolution layers and pooling layers is different. For convolution layers, padding is filled with values of zero, and padding is multiplied with kernels. For pooling layers, padding is excluded from the computation. For example when applying average pooling to a 4x4 grid, how much padding is added will not impact the output. Here is an example that demonstrates the difference.

x_in = np.array([[
  [[2], [2]],
  [[1], [1]],
  [[1], [1]]]])
kernel_in = np.array([  # simulate the avg_pool with conv2d
 [ [[0.25]], [[0.25]] ],
 [ [[0.25]], [[0.25]] ]])
x = tf.constant(x_in, dtype=tf.float32)
kernel = tf.constant(kernel_in, dtype=tf.float32)
conv_out = tf.nn.conv2d(x, kernel, strides=[1, 1, 1, 1], padding='SAME')
pool_out = tf.nn.avg_pool(x, [2, 2], strides=[1, 1, 1, 1], padding='SAME')
print(conv_out.shape, pool_out.shape)
(1, 3, 2, 1) (1, 3, 2, 1)
tf.reshape(conv_out, [3, 2]).numpy()  # conv2d takes account of padding
array([[1.5 , 0.75],
       [1.  , 0.5 ],
       [0.5 , 0.25]], dtype=float32)
tf.reshape(pool_out, [3, 2]).numpy()  # avg_pool excludes padding
array([[1.5, 1.5],
       [1. , 1. ],
       [1. , 1. ]], dtype=float32)

Modules

experimental module: Public API for tf.nn.experimental namespace.

Classes

class RNNCellDeviceWrapper: Operator that ensures an RNNCell runs on a particular device. (deprecated)

class RNNCellDropoutWrapper: Operator adding dropout to inputs and outputs of the given cell. (deprecated)

class RNNCellResidualWrapper: RNNCell wrapper that ensures cell inputs are added to the outputs. (deprecated)

Functions

all_candidate_sampler(...): Generate the set of all classes.

approx_max_k(...): Returns max k values and their indices of the input operand in an approximate manner.

approx_min_k(...): Returns min k values and their indices of the input operand in an approximate manner.

atrous_conv2d(...): Atrous convolution (a.k.a. convolution with holes or dilated convolution).

atrous_conv2d_transpose(...): The transpose of atrous_conv2d.

avg_pool(...): Performs the avg pooling on the input.

avg_pool1d(...): Performs the average pooling on the input.

avg_pool2d(...): Performs the average pooling on the input.

avg_pool3d(...): Performs the average pooling on the input.

batch_norm_with_global_normalization(...): Batch normalization.

batch_normalization(...): Batch normalization.

bias_add(...): Adds bias to value.

collapse_repeated(...): Merge repeated labels into single labels.

compute_accidental_hits(...): Compute the position ids in sampled_candidates matching true_classes.

compute_average_loss(...): Scales per-example losses with sample_weights and computes their average.

conv1d(...): Computes a 1-D convolution given 3-D input and filter tensors.

conv1d_transpose(...): The transpose of conv1d.

conv2d(...): Computes a 2-D convolution given input and 4-D filters tensors.

conv2d_transpose(...): The transpose of conv2d.

conv3d(...): Computes a 3-D convolution given 5-D input and filters tensors.

conv3d_transpose(...): The transpose of conv3d.

conv_transpose(...): The transpose of convolution.

convolution(...): Computes sums of N-D convolutions (actually cross-correlation).

crelu(...): Computes Concatenated ReLU.

ctc_beam_search_decoder(...): Performs beam search decoding on the logits given in input.

ctc_greedy_decoder(...): Performs greedy decoding on the logits given in input (best path).

ctc_loss(...): Computes CTC (Connectionist Temporal Classification) loss.

ctc_unique_labels(...): Get unique labels and indices for batched labels for tf.nn.ctc_loss.

depth_to_space(...): DepthToSpace for tensors of type T.

depthwise_conv2d(...): Depthwise 2-D convolution.

depthwise_conv2d_backprop_filter(...): Computes the gradients of depthwise convolution with respect to the filter.

depthwise_conv2d_backprop_input(...): Computes the gradients of depthwise convolution with respect to the input.

dilation2d(...): Computes the grayscale dilation of 4-D input and 3-D filters tensors.

dropout(...): Computes dropout: randomly sets elements to zero to prevent overfitting.

elu(...): Computes the exponential linear function.

embedding_lookup(...): Looks up embeddings for the given ids from a list of tensors.

embedding_lookup_sparse(...): Looks up embeddings for the given ids and weights from a list of tensors.

erosion2d(...): Computes the grayscale erosion of 4-D value and 3-D filters tensors.

fixed_unigram_candidate_sampler(...): Samples a set of classes using the provided (fixed) base distribution.

fractional_avg_pool(...): Performs fractional average pooling on the input.

fractional_max_pool(...): Performs fractional max pooling on the input.

gelu(...): Compute the Gaussian Error Linear Unit (GELU) activation function.

in_top_k(...): Says whether the targets are in the top K predictions.

isotonic_regression(...): Solves isotonic regression problems along the given axis.

l2_loss(...): L2 Loss.

l2_normalize(...): Normalizes along dimension axis using an L2 norm. (deprecated arguments)

leaky_relu(...): Compute the Leaky ReLU activation function.

learned_unigram_candidate_sampler(...): Samples a set of classes from a distribution learned during training.

local_response_normalization(...): Local Response Normalization.

log_poisson_loss(...): Computes log Poisson loss given log_input.

log_softmax(...): Computes log softmax activations.

lrn(...): Local Response Normalization.

max_pool(...): Performs max pooling on the input.

max_pool1d(...): Performs the max pooling on the input.

max_pool2d(...): Performs max pooling on 2D spatial data such as images.

max_pool3d(...): Performs the max pooling on the input.

max_pool_with_argmax(...): Performs max pooling on the input and outputs both max values and indices.

moments(...): Calculates the mean and variance of x.

nce_loss(...): Computes and returns the noise-contrastive estimation training loss.

normalize_moments(...): Calculate the mean and variance of based on the sufficient statistics.

pool(...): Performs an N-D pooling operation.

relu(...): Computes rectified linear: max(features, 0).

relu6(...): Computes Rectified Linear 6: min(max(features, 0), 6).

safe_embedding_lookup_sparse(...): Lookup embedding results, accounting for invalid IDs and empty features.

sampled_softmax_loss(...): Computes and returns the sampled softmax training loss.

scale_regularization_loss(...): Scales the sum of the given regularization losses by number of replicas.

selu(...): Computes scaled exponential linear: scale * alpha * (exp(features) - 1)

separable_conv2d(...): 2-D convolution with separable filters.

sigmoid(...): Computes sigmoid of x element-wise.

sigmoid_cross_entropy_with_logits(...): Computes sigmoid cross entropy given logits.

silu(...): Computes the SiLU or Swish activation function: x * sigmoid(beta * x).

softmax(...): Computes softmax activations.

softmax_cross_entropy_with_logits(...): Computes softmax cross entropy between logits and labels.

softplus(...): Computes elementwise softplus: softplus(x) = log(exp(x) + 1).

softsign(...): Computes softsign: features / (abs(features) + 1).

space_to_batch(...): SpaceToBatch for N-D tensors of type T.

space_to_depth(...): SpaceToDepth for tensors of type T.

sparse_softmax_cross_entropy_with_logits(...): Computes sparse softmax cross entropy between logits and labels.

sufficient_statistics(...): Calculate the sufficient statistics for the mean and variance of x.

swish(...): Computes the SiLU or Swish activation function: x * sigmoid(beta * x).

tanh(...): Computes hyperbolic tangent of x element-wise.

top_k(...): Finds values and indices of the k largest entries for the last dimension.

weighted_cross_entropy_with_logits(...): Computes a weighted cross entropy.

weighted_moments(...): Returns the frequency-weighted mean and variance of x.

with_space_to_batch(...): Performs op on the space-to-batch representation of input.

zero_fraction(...): Returns the fraction of zeros in value.