Creates an integerized _SparseColumn.
tf.contrib.layers.sparse_column_with_integerized_feature(
column_name, bucket_size, combiner='sum', dtype=tf.dtypes.int64
)
Use this when your features are already pre-integerized into int64 IDs, that
is, when the set of values to output is already coming in as what's desired in
the output. Integerized means we can use the feature value itself as id.
Typically this is used for reading contiguous ranges of integers indexes, but
it doesn't have to be. The output value is simply copied from the
input_feature, whatever it is. Just be aware, however, that if you have large
gaps of unused integers it might affect what you feed those in (for instance,
if you make up a one-hot tensor from these, the unused integers will appear as
values in the tensor which are always zero.)
Args |
column_name
|
A string defining sparse column name.
|
bucket_size
|
An int that is >= 1. The number of buckets. It should be bigger
than maximum feature. In other words features in this column should be an
int64 in range [0, bucket_size)
|
combiner
|
A string specifying how to reduce if the sparse column is
multivalent. Currently "mean", "sqrtn" and "sum" are supported, with "sum"
the default. "sqrtn" often achieves good accuracy, in particular with
bag-of-words columns.
- "sum": do not normalize features in the column
- "mean": do l1 normalization on features in the column
- "sqrtn": do l2 normalization on features in the column
For more information:
tf.embedding_lookup_sparse .
|
dtype
|
Type of features. It should be an integer type. Default value is
dtypes.int64.
|
Returns |
An integerized _SparseColumn definition.
|
Raises |
ValueError
|
bucket_size is less than 1.
|
ValueError
|
dtype is not integer.
|