View source on GitHub |
Performs max pooling on the input.
tf.nn.max_pool(
input, ksize, strides, padding, data_format=None, name=None
)
For a given window of ksize
, takes the maximum value within that window.
Used for reducing computation and preventing overfitting.
Consider an example of pooling with 2x2, non-overlapping windows:
matrix = tf.constant([
[0, 0, 1, 7],
[0, 2, 0, 0],
[5, 2, 0, 0],
[0, 0, 9, 8],
])
reshaped = tf.reshape(matrix, (1, 4, 4, 1))
tf.nn.max_pool(reshaped, ksize=2, strides=2, padding="SAME")
<tf.Tensor: shape=(1, 2, 2, 1), dtype=int32, numpy=
array([[[[2],
[7]],
[[5],
[9]]]], dtype=int32)>
We can adjust the window size using the ksize
parameter. For example, if we
were to expand the window to 3:
tf.nn.max_pool(reshaped, ksize=3, strides=2, padding="SAME")
<tf.Tensor: shape=(1, 2, 2, 1), dtype=int32, numpy=
array([[[[5],
[7]],
[[9],
[9]]]], dtype=int32)>
We've now picked up two additional large numbers (5 and 9) in two of the pooled spots.
Note that our windows are now overlapping, since we're still moving by 2 units on each iteration. This is causing us to see the same 9 repeated twice, since it is part of two overlapping windows.
We can adjust how far we move our window with each iteration using the
strides
parameter. Updating this to the same value as our window size
eliminates the overlap:
tf.nn.max_pool(reshaped, ksize=3, strides=3, padding="SAME")
<tf.Tensor: shape=(1, 2, 2, 1), dtype=int32, numpy=
array([[[[2],
[7]],
[[5],
[9]]]], dtype=int32)>
Because the window does not neatly fit into our input, padding is added around
the edges, giving us the same result as when we used a 2x2 window. We can skip
padding altogether and simply drop the windows that do not fully fit into our
input by instead passing "VALID"
to the padding
argument:
tf.nn.max_pool(reshaped, ksize=3, strides=3, padding="VALID")
<tf.Tensor: shape=(1, 1, 1, 1), dtype=int32, numpy=array([[[[5]]]],
dtype=int32)>
Now we've grabbed the largest value in the 3x3 window starting from the upper- left corner. Since no other windows fit in our input, they are dropped.
Args | |
---|---|
input
|
Tensor of rank N+2, of shape [batch_size] + input_spatial_shape +
[num_channels] if data_format does not start with "NC" (default), or
[batch_size, num_channels] + input_spatial_shape if data_format starts
with "NC". Pooling happens over the spatial dimensions only.
|
ksize
|
An int or list of ints that has length 1 , N or N+2 . The size
of the window for each dimension of the input tensor.
|
strides
|
An int or list of ints that has length 1 , N or N+2 . The
stride of the sliding window for each dimension of the input tensor.
|
padding
|
Either the string "SAME" or "VALID" indicating the type of
padding algorithm to use, or a list indicating the explicit paddings at
the start and end of each dimension. See
here
for more information. When explicit padding is used and data_format is
"NHWC" , this should be in the form [[0, 0], [pad_top, pad_bottom],
[pad_left, pad_right], [0, 0]] . When explicit padding used and
data_format is "NCHW" , this should be in the form [[0, 0], [0, 0],
[pad_top, pad_bottom], [pad_left, pad_right]] . When using explicit
padding, the size of the paddings cannot be greater than the sliding
window size.
|
data_format
|
A string. Specifies the channel dimension. For N=1 it can be either "NWC" (default) or "NCW", for N=2 it can be either "NHWC" (default) or "NCHW" and for N=3 either "NDHWC" (default) or "NCDHW". |
name
|
Optional name for the operation. |
Returns | |
---|---|
A Tensor of format specified by data_format .
The max pooled output tensor.
|