tfdf.keras.pd_dataframe_to_tf_dataset

Converts a Panda Dataframe into a TF Dataset compatible with Keras.

Used in the notebooks

Used in the guide Used in the tutorials

  • Ensures columns have uniform types.
  • If "label" is provided, separate it as a second channel in the tf.Dataset (as expected by Keras).
  • If "weight" is provided, separate it as a third channel in the tf.Dataset (as expected by Keras).
  • If "task" is provided, ensure the correct dtype of the label. If the task is a classification and the label is a string, integerize the labels. In this case, the label values are extracted from the dataset and ordered lexicographically. Warning: This logic won't work as expected if the training and testing dataset contain different label values. In such case, it is preferable to convert the label to integers beforehand while making sure the same encoding is used for all the datasets.
  • Returns "tf.data.from_tensor_slices"

dataframe Pandas dataframe containing a training or evaluation dataset.
label Name of the label column.
task Target task of the dataset.
max_num_classes Maximum number of classes for a classification task. A high number of unique value / classes might indicate that the problem is a regression or a ranking instead of a classification. Set to None to disable checking the number of classes.
in_place If false (default), the input dataframe will not be modified by pd_dataframe_to_tf_dataset. However, a copy of the dataset memory will be made. If true, the dataframe will be modified in-place.
fix_feature_names Some feature names are not supported by the SavedModel signature. If fix_feature_names=True (default) the feature will be renamed and made compatible. If fix_feature_names=False, the feature name will not be changed, but exporting the model might fail (i.e. model.save(...)).
weight Optional name of a column in dataframe to use to weight the training.
batch_size Number of examples in each batch. The size of the batches has no impact on the TF-DF training algorithms. However, a small batch size can lead to a large overhead when loading the dataset. Defaults to 1000, but if batch_size is set to None, no batching is applied. Note: TF-DF expects for the dataset to be batched.

A TensorFlow Dataset.