Warning: This API is deprecated and will be removed in a future version of TensorFlow after the replacement is stable.

UnicodeDecode

public final class UnicodeDecode

Decodes each string in `input` into a sequence of Unicode code points.

The character codepoints for all strings are returned using a single vector `char_values`, with strings expanded to characters in row-major order.

The `row_splits` tensor indicates where the codepoints for each input string begin and end within the `char_values` tensor. In particular, the values for the `i`th string (in row-major order) are stored in the slice `[row_splits[i]:row_splits[i+1]]`. Thus:

`char_values[row_splits[i]+j]` is the Unicode codepoint for the `j`th character in the `i`th string (in row-major order).
`row_splits[i+1] - row_splits[i]` is the number of characters in the `i`th string (in row-major order).

Nested Classes

class UnicodeDecode.Options Optional attributes for UnicodeDecode

Public Methods

Output<Integer>	charValues() A 1D int32 Tensor containing the decoded codepoints.
static <T extends Number> UnicodeDecode<T>	create(Scope scope, Operand<String> input, String inputEncoding, Class<T> Tsplits, Options... options) Factory method to create a class wrapping a new UnicodeDecode operation.
static UnicodeDecode<Long>	create(Scope scope, Operand<String> input, String inputEncoding, Options... options) Factory method to create a class wrapping a new UnicodeDecode operation using default output types.
static UnicodeDecode.Options	errors(String errors)
static UnicodeDecode.Options	replaceControlCharacters(Boolean replaceControlCharacters)
static UnicodeDecode.Options	replacementChar(Long replacementChar)
Output<T>	rowSplits() A 1D int32 tensor containing the row splits.

Inherited Methods

From class org.tensorflow.op.PrimitiveOp

final boolean	equals(Object obj)
final int	hashCode()
Operation	op() Returns the underlying `Operation`
final String	toString()

From class java.lang.Object

boolean	equals(Object arg0)
final Class<?>	getClass()
int	hashCode()
final void	notify()
final void	notifyAll()
String	toString()
final void	wait(long arg0, int arg1)
final void	wait(long arg0)
final void	wait()

Public Methods

public Output<Integer> charValues ()

A 1D int32 Tensor containing the decoded codepoints.

public static UnicodeDecode<T> create (Scope scope, Operand<String> input, String inputEncoding, Class<T> Tsplits, Options... options)

Factory method to create a class wrapping a new UnicodeDecode operation.

Parameters

scope	current scope
input	The text to be decoded. Can have any shape. Note that the output is flattened to a vector of char values.
inputEncoding	Text encoding of the input strings. This is any of the encodings supported by ICU ucnv algorithmic converters. Examples: `"UTF-16", "US ASCII", "UTF-8"`.
options	carries optional attributes values

Returns

a new instance of UnicodeDecode

public static UnicodeDecode<Long> create (Scope scope, Operand<String> input, String inputEncoding, Options... options)

Factory method to create a class wrapping a new UnicodeDecode operation using default output types.

Parameters

scope	current scope
input	The text to be decoded. Can have any shape. Note that the output is flattened to a vector of char values.
inputEncoding	Text encoding of the input strings. This is any of the encodings supported by ICU ucnv algorithmic converters. Examples: `"UTF-16", "US ASCII", "UTF-8"`.
options	carries optional attributes values

Returns

a new instance of UnicodeDecode

public static UnicodeDecode.Options errors (String errors)

Parameters

errors	Error handling policy when there is invalid formatting found in the input. The value of 'strict' will cause the operation to produce a InvalidArgument error on any invalid input formatting. A value of 'replace' (the default) will cause the operation to replace any invalid formatting in the input with the `replacement_char` codepoint. A value of 'ignore' will cause the operation to skip any invalid formatting in the input and produce no corresponding output character.

errors

Error handling policy when there is invalid formatting found in the input. The value of 'strict' will cause the operation to produce a InvalidArgument error on any invalid input formatting. A value of 'replace' (the default) will cause the operation to replace any invalid formatting in the input with the `replacement_char` codepoint. A value of 'ignore' will cause the operation to skip any invalid formatting in the input and produce no corresponding output character.

public static UnicodeDecode.Options replaceControlCharacters (Boolean replaceControlCharacters)

Parameters

replaceControlCharacters	Whether to replace the C0 control characters (00-1F) with the `replacement_char`. Default is false.

public static UnicodeDecode.Options replacementChar (Long replacementChar)

Parameters

replacementChar	The replacement character codepoint to be used in place of any invalid formatting in the input when `errors='replace'`. Any valid unicode codepoint may be used. The default value is the default unicode replacement character is 0xFFFD or U+65533.)

public Output<T> rowSplits ()

A 1D int32 tensor containing the row splits.