Primitive Neural Net (NN) Operations.
Notes on padding
Several neural network operations, such as tf.nn.conv2d
and
tf.nn.max_pool2d
, take a padding
parameter, which controls how the input is
padded before running the operation. The input is padded by inserting values
(typically zeros) before and after the tensor in each spatial dimension. The
padding
parameter can either be the string 'VALID'
, which means use no
padding, or 'SAME'
which adds padding according to a formula which is
described below. Certain ops also allow the amount of padding per dimension to
be explicitly specified by passing a list to padding
.
In the case of convolutions, the input is padded with zeros. In case of pools,
the padded input values are ignored. For example, in a max pool, the sliding
window ignores padded values, which is equivalent to the padded values being
-infinity
.
'VALID'
padding
Passing padding='VALID'
to an op causes no padding to be used. This causes the
output size to typically be smaller than the input size, even when the stride is
one. In the 2D case, the output size is computed as:
out_height = ceil((in_height - filter_height + 1) / stride_height)
out_width = ceil((in_width - filter_width + 1) / stride_width)
The 1D and 3D cases are similar. Note filter_height
and filter_width
refer
to the filter size after dilations (if any) for convolutions, and refer to the
window size for pools.
'SAME'
padding
With 'SAME'
padding, padding is applied to each spatial dimension. When the
strides are 1, the input is padded such that the output size is the same as the
input size. In the 2D case, the output size is computed as:
out_height = ceil(in_height / stride_height)
out_width = ceil(in_width / stride_width)
The amount of padding used is the smallest amount that results in the output size. The formula for the total amount of padding per dimension is:
if (in_height % strides[1] == 0):
pad_along_height = max(filter_height - stride_height, 0)
else:
pad_along_height = max(filter_height - (in_height % stride_height), 0)
if (in_width % strides[2] == 0):
pad_along_width = max(filter_width - stride_width, 0)
else:
pad_along_width = max(filter_width - (in_width % stride_width), 0)
Finally, the padding on the top, bottom, left and right are:
pad_top = pad_along_height // 2
pad_bottom = pad_along_height - pad_top
pad_left = pad_along_width // 2
pad_right = pad_along_width - pad_left
Note that the division by 2 means that there might be cases when the padding on both sides (top vs bottom, right vs left) are off by one. In this case, the bottom and right sides always get the one additional padded pixel. For example, when pad_along_height is 5, we pad 2 pixels at the top and 3 pixels at the bottom. Note that this is different from existing libraries such as PyTorch and Caffe, which explicitly specify the number of padded pixels and always pad the same number of pixels on both sides.
Here is an example of 'SAME'
padding:
in_height = 5
filter_height = 3
stride_height = 2
in_width = 2
filter_width = 2
stride_width = 1
inp = tf.ones((2, in_height, in_width, 2))
filter = tf.ones((filter_height, filter_width, 2, 2))
strides = [stride_height, stride_width]
output = tf.nn.conv2d(inp, filter, strides, padding='SAME')
output.shape[1] # output_height: ceil(5 / 2)
3
output.shape[2] # output_width: ceil(2 / 1)
2
Explicit padding
Certain ops, like tf.nn.conv2d
, also allow a list of explicit padding amounts
to be passed to the padding
parameter. This list is in the same format as what
is passed to tf.pad
, except the padding must be a nested list, not a tensor.
For example, in the 2D case, the list is in the format [[0, 0], [pad_top,
pad_bottom], [pad_left, pad_right], [0, 0]]
when data_format
is its default
value of 'NHWC'
. The two [0, 0]
pairs indicate the batch and channel
dimensions have no padding, which is required, as only spatial dimensions can
have padding.
For example:
inp = tf.ones((1, 3, 3, 1))
filter = tf.ones((2, 2, 1, 1))
strides = [1, 1]
padding = [[0, 0], [1, 2], [0, 1], [0, 0]]
output = tf.nn.conv2d(inp, filter, strides, padding=padding)
tuple(output.shape)
(1, 5, 3, 1)
# Equivalently, tf.pad can be used, since convolutions pad with zeros.
inp = tf.pad(inp, padding)
# 'VALID' means to use no padding in conv2d (we already padded inp)
output2 = tf.nn.conv2d(inp, filter, strides, padding='VALID')
tf.debugging.assert_equal(output, output2)
Difference between convolution and pooling layers
How padding is used in convolution layers and pooling layers is different. For convolution layers, padding is filled with values of zero, and padding is multiplied with kernels. For pooling layers, padding is excluded from the computation. For example when applying average pooling to a 4x4 grid, how much padding is added will not impact the output. Here is an example that demonstrates the difference.
x_in = np.array([[
[[2], [2]],
[[1], [1]],
[[1], [1]]]])
kernel_in = np.array([ # simulate the avg_pool with conv2d
[ [[0.25]], [[0.25]] ],
[ [[0.25]], [[0.25]] ]])
x = tf.constant(x_in, dtype=tf.float32)
kernel = tf.constant(kernel_in, dtype=tf.float32)
conv_out = tf.nn.conv2d(x, kernel, strides=[1, 1, 1, 1], padding='SAME')
pool_out = tf.nn.avg_pool(x, [2, 2], strides=[1, 1, 1, 1], padding='SAME')
print(conv_out.shape, pool_out.shape)
(1, 3, 2, 1) (1, 3, 2, 1)
tf.reshape(conv_out, [3, 2]).numpy() # conv2d takes account of padding
array([[1.5 , 0.75],
[1. , 0.5 ],
[0.5 , 0.25]], dtype=float32)
tf.reshape(pool_out, [3, 2]).numpy() # avg_pool excludes padding
array([[1.5, 1.5],
[1. , 1. ],
[1. , 1. ]], dtype=float32)
Modules
experimental
module: Public API for tf.nn.experimental namespace.
rnn_cell
module: Public API for tf.keras.internal.legacy.rnn_cell namespace.
Functions
all_candidate_sampler(...)
: Generate the set of all classes.
approx_max_k(...)
: Returns max k
values and their indices of the input operand
in an approximate manner.
approx_min_k(...)
: Returns min k
values and their indices of the input operand
in an approximate manner.
atrous_conv2d(...)
: Atrous convolution (a.k.a. convolution with holes or dilated convolution).
atrous_conv2d_transpose(...)
: The transpose of atrous_conv2d
.
avg_pool(...)
: Performs the average pooling on the input.
avg_pool1d(...)
: Performs the average pooling on the input.
avg_pool2d(...)
: Performs the average pooling on the input.
avg_pool3d(...)
: Performs the average pooling on the input.
avg_pool_v2(...)
: Performs the avg pooling on the input.
batch_norm_with_global_normalization(...)
: Batch normalization.
batch_normalization(...)
: Batch normalization.
bias_add(...)
: Adds bias
to value
.
bidirectional_dynamic_rnn(...)
: Creates a dynamic version of bidirectional recurrent neural network. (deprecated)
collapse_repeated(...)
: Merge repeated labels into single labels.
compute_accidental_hits(...)
: Compute the position ids in sampled_candidates
matching true_classes
.
compute_average_loss(...)
: Scales per-example losses with sample_weights and computes their average.
conv1d(...)
: Computes a 1-D convolution of input with rank >=3
and a 3-D
filter. (deprecated argument values) (deprecated argument values)
conv1d_transpose(...)
: The transpose of conv1d
.
conv2d(...)
: Computes a 2-D convolution given 4-D input
and filter
tensors.
conv2d_backprop_filter(...)
: Computes the gradients of convolution with respect to the filter.
conv2d_backprop_input(...)
: Computes the gradients of convolution with respect to the input.
conv2d_transpose(...)
: The transpose of conv2d
.
conv3d(...)
: Computes a 3-D convolution given 5-D input
and filter
tensors.
conv3d_backprop_filter(...)
: Computes the gradients of 3-D convolution with respect to the filter.
conv3d_backprop_filter_v2(...)
: Computes the gradients of 3-D convolution with respect to the filter.
conv3d_transpose(...)
: The transpose of conv3d
.
conv_transpose(...)
: The transpose of convolution
.
convolution(...)
: Computes sums of N-D convolutions (actually cross-correlation).
crelu(...)
: Computes Concatenated ReLU.
ctc_beam_search_decoder(...)
: Performs beam search decoding on the logits given in input.
ctc_beam_search_decoder_v2(...)
: Performs beam search decoding on the logits given in input.
ctc_greedy_decoder(...)
: Performs greedy decoding on the logits given in input (best path).
ctc_loss(...)
: Computes the CTC (Connectionist Temporal Classification) Loss.
ctc_loss_v2(...)
: Computes CTC (Connectionist Temporal Classification) loss.
ctc_unique_labels(...)
: Get unique labels and indices for batched labels for tf.nn.ctc_loss
.
depth_to_space(...)
: DepthToSpace for tensors of type T.
depthwise_conv2d(...)
: Depthwise 2-D convolution.
depthwise_conv2d_backprop_filter(...)
: Computes the gradients of depthwise convolution with respect to the filter.
depthwise_conv2d_backprop_input(...)
: Computes the gradients of depthwise convolution with respect to the input.
depthwise_conv2d_native(...)
: Computes a 2-D depthwise convolution.
depthwise_conv2d_native_backprop_filter(...)
: Computes the gradients of depthwise convolution with respect to the filter.
depthwise_conv2d_native_backprop_input(...)
: Computes the gradients of depthwise convolution with respect to the input.
dilation2d(...)
: Computes the grayscale dilation of 4-D input
and 3-D filter
tensors.
dropout(...)
: Computes dropout. (deprecated arguments)
dynamic_rnn(...)
: Creates a recurrent neural network specified by RNNCell cell
. (deprecated)
elu(...)
: Computes the exponential linear function.
embedding_lookup(...)
: Looks up embeddings for the given ids
from a list of tensors.
embedding_lookup_sparse(...)
: Looks up embeddings for the given ids and weights from a list of tensors.
erosion2d(...)
: Computes the grayscale erosion of 4-D value
and 3-D kernel
tensors.
fixed_unigram_candidate_sampler(...)
: Samples a set of classes using the provided (fixed) base distribution.
fractional_avg_pool(...)
: Performs fractional average pooling on the input. (deprecated)
fractional_max_pool(...)
: Performs fractional max pooling on the input. (deprecated)
fused_batch_norm(...)
: Batch normalization.
in_top_k(...)
: Says whether the targets are in the top K
predictions.
l2_loss(...)
: L2 Loss.
l2_normalize(...)
: Normalizes along dimension axis
using an L2 norm. (deprecated arguments)
leaky_relu(...)
: Compute the Leaky ReLU activation function.
learned_unigram_candidate_sampler(...)
: Samples a set of classes from a distribution learned during training.
local_response_normalization(...)
: Local Response Normalization.
log_poisson_loss(...)
: Computes log Poisson loss given log_input
.
log_softmax(...)
: Computes log softmax activations. (deprecated arguments)
log_uniform_candidate_sampler(...)
: Samples a set of classes using a log-uniform (Zipfian) base distribution.
lrn(...)
: Local Response Normalization.
max_pool(...)
: Performs the max pooling on the input.
max_pool1d(...)
: Performs the max pooling on the input.
max_pool2d(...)
: Performs max pooling on 2D spatial data such as images.
max_pool3d(...)
: Performs the max pooling on the input.
max_pool_v2(...)
: Performs max pooling on the input.
max_pool_with_argmax(...)
: Performs max pooling on the input and outputs both max values and indices.
moments(...)
: Calculate the mean and variance of x
.
nce_loss(...)
: Computes and returns the noise-contrastive estimation training loss.
normalize_moments(...)
: Calculate the mean and variance of based on the sufficient statistics.
pool(...)
: Performs an N-D pooling operation.
quantized_avg_pool(...)
: Produces the average pool of the input tensor for quantized types.
quantized_conv2d(...)
: Computes a 2D convolution given quantized 4D input and filter tensors.
quantized_max_pool(...)
: Produces the max pool of the input tensor for quantized types.
quantized_relu_x(...)
: Computes Quantized Rectified Linear X: min(max(features, 0), max_value)
raw_rnn(...)
: Creates an RNN
specified by RNNCell cell
and loop function loop_fn
.
relu(...)
: Computes rectified linear: max(features, 0)
.
relu6(...)
: Computes Rectified Linear 6: min(max(features, 0), 6)
.
relu_layer(...)
: Computes Relu(x * weight + biases).
safe_embedding_lookup_sparse(...)
: Lookup embedding results, accounting for invalid IDs and empty features.
sampled_softmax_loss(...)
: Computes and returns the sampled softmax training loss.
scale_regularization_loss(...)
: Scales the sum of the given regularization losses by number of replicas.
selu(...)
: Computes scaled exponential linear: scale * alpha * (exp(features) - 1)
separable_conv2d(...)
: 2-D convolution with separable filters.
sigmoid(...)
: Computes sigmoid of x
element-wise.
sigmoid_cross_entropy_with_logits(...)
: Computes sigmoid cross entropy given logits
.
silu(...)
: Computes the SiLU or Swish activation function: x * sigmoid(beta * x)
.
softmax(...)
: Computes softmax activations.
softmax_cross_entropy_with_logits(...)
: Computes softmax cross entropy between logits
and labels
. (deprecated)
softmax_cross_entropy_with_logits_v2(...)
: Computes softmax cross entropy between logits
and labels
. (deprecated arguments)
softplus(...)
: Computes elementwise softplus: softplus(x) = log(exp(x) + 1)
.
softsign(...)
: Computes softsign: features / (abs(features) + 1)
.
space_to_batch(...)
: SpaceToBatch for 4-D tensors of type T.
space_to_depth(...)
: SpaceToDepth for tensors of type T.
sparse_softmax_cross_entropy_with_logits(...)
: Computes sparse softmax cross entropy between logits
and labels
.
static_bidirectional_rnn(...)
: Creates a bidirectional recurrent neural network. (deprecated)
static_rnn(...)
: Creates a recurrent neural network specified by RNNCell cell
. (deprecated)
static_state_saving_rnn(...)
: RNN that accepts a state saver for time-truncated RNN calculation. (deprecated)
sufficient_statistics(...)
: Calculate the sufficient statistics for the mean and variance of x
.
swish(...)
: Computes the SiLU or Swish activation function: x * sigmoid(beta * x)
.
tanh(...)
: Computes hyperbolic tangent of x
element-wise.
top_k(...)
: Finds values and indices of the k
largest entries for the last dimension.
uniform_candidate_sampler(...)
: Samples a set of classes using a uniform base distribution.
weighted_cross_entropy_with_logits(...)
: Computes a weighted cross entropy. (deprecated arguments)
weighted_moments(...)
: Returns the frequency-weighted mean and variance of x
.
with_space_to_batch(...)
: Performs op
on the space-to-batch representation of input
.
xw_plus_b(...)
: Computes matmul(x, weights) + biases.
zero_fraction(...)
: Returns the fraction of zeros in value
.