New! Use Simple ML for Sheets to apply machine learning to the data in your Google Sheets Read More

tfdf.inspector.AbstractInspector

Abstract inspector for all Yggdrasil models.

tfdf.inspector.AbstractInspector(
    directory: str, file_prefix: str
)

Attributes
`dataspec`	Gets the dataspec.
`directory`	Directory of the model.
`file_prefix`	Filename prefix of the model.
`header`	Gets the generic model header.
`metadata`	Gets the model's metadata.
`task`	Task solved by the model.

Methods

`evaluation`

View source

evaluation() -> Optional[tfdf.inspector.Evaluation]

Model self evaluation.

The model self evaluation is a cheap alternative to the use of a separate validation dataset or cross-validation. The exact implementation depends on the model e.g. Out-of-bag evaluation, internal train-validation.

During training, some models (e.g. Gradient Boosted Tree) used this evaluation for early stopping (if early stopping is enabled).

While this evaluation is computed during training, it can be used as a low quality model evaluation.

Returns
The evaluation, or None is not evaluation is available.

`export_to_tensorboard`

View source

export_to_tensorboard(
    path: str
) -> None

Export the training logs (and possibly other metadata) to TensorBoard.

Usage examples in Colab:

model.make_inspector().export_to_tensorboard("/tmp/tensorboard_logs")
%load_ext tensorboard
%tensorboard --logdir "/tmp/tensorboard_logs"

Note that you can compare multiple models runs using sub-directories. For examples:

model_1.make_inspector().export_to_tensorboard("/tmp/tb_logs/model_1")
model_2.make_inspector().export_to_tensorboard("/tmp/tb_logs/model_2")

%load_ext tensorboard
%tensorboard --logdir "/tmp/tb_logs"

Args
`path`	Output directory for the logs.

`features`

View source

features() -> List[tfdf.inspector.SimpleColumnSpec]

Input features of the model.

`label`

View source

label() -> tfdf.inspector.SimpleColumnSpec

Label predicted by the model.

`label_classes`

View source

label_classes() -> Optional[List[str]]

Possible classes of the label.

If the task is not a classification, or if the labels are dense integers, returns None.

Returns
The list of label values, or None.

`model_type`

View source

@abc.abstractmethod
model_type() -> str

Unique key describing the type of the model.

Note that different learners can output similar model types, and a given learner can output different model types.

`objective`

View source

objective(
    extract_classification_label_info=True
) -> tfdf.py_tree.objective.AbstractObjective

Objective solved by the model i.e. Task + extra information.

Args
`extract_classification_label_info`	If true, and if the task is classification, extract the label classes. If false, the label classes are not extracted and set to two. This allows extracting the objective even if the label classes are non-unicode (and therefore not readable by TF-DF).

Returns
The objective of the model.

`training_logs`

View source

training_logs() -> Optional[List[TrainLog]]

Evaluation metrics and statistics about the model during training.

The training logs show the quality of the model (e.g. accuracy evaluated on the out-of-bag or validation dataset) according to the number of trees in the model. Logs are useful to characterize the balance between model size and model quality.

`tuning_logs`

View source

tuning_logs(
    return_format: Literal['table', 'proto'] = 'table'
) -> Optional[Union[pd.DataFrame, abstract_model_pb2.HyperparametersOptimizerLogs]]

Returns the hyperparameter tuning logs.

Those logs contain the candidate hyperparameters and score of each trial. If the model was not trained with hyper-parameter tuning, return None.

Args
`return_format`	Output format. - table: A pandas dataframe. - proto: A abstract_model_pb2.HyperparametersOptimizerLogs proto.

Returns
The hyperparameter tuning logs, or None (if the model was trained without hyperparameter tuning).

`variable_importances`

View source

variable_importances() -> Dict[str, List[Tuple[py_tree.dataspec.SimpleColumnSpec, float]]]

Variable importances (VI) i.e impact of each feature to the model.

VIs generally indicates how much a variable contributes to the model predictions or quality. Different VIs have different semantics and are generally not comparable.

The VIs returned by variable_importances() depends on the learning algorithm and its hyper-parameters. For example, the hyperparameter compute_oob_variable_importances=True of the Random Forest learner enables the computation of permutation out-of-bag variable importances.

See https://ydf.readthedocs.io/en/latest/cli_user_manual/#variable-importances for the definition of the variable importances.

Values are sorted by decreasing value/importance except if stated otherwise.

Usage example:

# Train a Random Forest. Enable the computation of OOB (out-of-bag) variable
# importances.
model = tfdf.keras.RandomForestModel(compute_oob_variable_importances=True)
model.fit(...)

# Print all the variable importances
model.summary()

# List the available variable importances
print(inspector.variable_importances().keys())

# Show a specific variable importance
# Each line is: (feature name, (index of the feature), importance score)
inspector.variable_importances()["MEAN_DECREASE_IN_ACCURACY"]
>> [("bill_length_mm" (1; #1), 0.0713061951754389),
>> ("island" (4; #4), 0.007298519736842035),
>> ("flipper_length_mm" (1; #3), 0.004505893640351366),
...

Returns
Variable importances.

tfdf.inspector.AbstractInspector

Attributes

Methods

evaluation

export_to_tensorboard

features

label

label_classes

model_type

objective

training_logs

tuning_logs

variable_importances

Usage example:

`evaluation`

`export_to_tensorboard`

`features`

`label`

`label_classes`

`model_type`

`objective`

`training_logs`

`tuning_logs`

`variable_importances`