The TFX command-line interface (CLI) performs a full range of pipeline actions using pipeline orchestrators, such as Apache Airflow, Apache Beam, and Kubeflow Pipelines. For example, you can use the CLI to:
- Create, update, and delete pipelines.
- Run a pipeline and monitor the run on various orchestrators.
- List pipelines and pipeline runs.
About the TFX CLI
The TFX CLI is installed as a part of the TFX package. All CLI commands follow the structure below:
tfx command-group command flags
The following command-group options are currently supported:
- tfx pipeline - Create and manage TFX pipelines.
- tfx run - Create and manage runs of TFX pipelines on various orchestration platforms.
- tfx template - Experimental commands for listing and copying TFX pipeline templates.
Each command group provides a set of commands. Follow the instructions in the pipeline commands, run commands, and template commands sections to learn more about using these commands.
Flags let you pass arguments into CLI commands. Words in flags are separated
with either a hyphen (-
) or an underscore (_
). For example, the pipeline
name flag can be specified as either --pipeline-name
or --pipeline_name
.
This document specifies flags with underscores for brevity. Learn more about
flags used in the TFX CLI.
tfx pipeline
The structure for commands in the tfx pipeline
command group is as follows:
tfx pipeline command required-flags [optional-flags]
Use the following sections to learn more about the commands in the tfx
pipeline
command group.
create
Creates a new pipeline in the given orchestrator.
Usage:
tfx pipeline create --pipeline_path=pipeline-path [--endpoint=endpoint --engine=engine \ --iap_client_id=iap-client-id --namespace=namespace --package_path=package-path \ --build_target_image=build-target-image --build_base_image=build-base-image \ --skaffold_cmd=skaffold-command]
- --pipeline_path=pipeline-path
- The path to the pipeline configuration file.
- --endpoint=endpoint
-
(Optional.) Endpoint of the Kubeflow Pipelines API service. The endpoint of your Kubeflow Pipelines API service is the same as URL of the Kubeflow Pipelines dashboard. Your endpoint value should be something like:
https://host-name/pipeline
If you do not know the endpoint for your Kubeflow Pipelines cluster, contact you cluster administrator.
If the
--endpoint
is not specified, the in-cluster service DNS name is used as the default value. This name works only if the CLI command executes in a pod on the Kubeflow Pipelines cluster, such as a Kubeflow Jupyter notebooks instance. - --engine=engine
-
(Optional.) The orchestrator to be used for the pipeline. The value of engine must match on of the following values:
- airflow: sets engine to Apache Airflow
- beam: sets engine to Apache Beam
- kubeflow: sets engine to Kubeflow
If the engine is not set, the engine is auto-detected based on the environment.
** Important note: The orchestrator required by the DagRunner in the pipeline config file must match the selected or autodetected engine. Engine auto-detection is based on user environment. If Apache Airflow and Kubeflow Pipelines are not installed, then Apache Beam is used by default.
- --iap_client_id=iap-client-id
- (Optional.) Client ID for IAP protected endpoint.
- --namespace=namespace
-
(Optional.) Kubernetes namespace to connect to the Kubeflow Pipelines API.
If the namespace is not specified, the value defaults to
kubeflow
. - --package_path=package-path
-
(Optional.) Path to the compiled pipeline as a file. The compiled pipeline should be a compressed file (
.tar.gz
,.tgz
, or.zip
) or a YAML file (.yaml
or.yml
).If package-path is not specified, TFX uses the following as the default path:
current_directory/pipeline_name.tar.gz
- --build_target_image=build-target-image
-
(Optional.) When the engine is kubeflow, TFX creates a container image for your pipeline. The build target image specifies the name, container image repository, and tag to use when creating the pipeline container image. If you do not specify a tag, the container image is tagged as
latest
.For your Kubeflow Pipelines cluster to run your pipeline, the cluster must be able to access the specified container image repository.
- --build_base_image=build-base-image
-
(Optional.) When the engine is kubeflow, TFX creates a container image for your pipeline. The build base image specifies the base container image to use when building the pipeline container image.
- --skaffold_cmd=skaffold-cmd
-
(Optional.) The path to Skaffold on your computer.
Examples:
Apache Airflow:
tfx pipeline create --engine=airflow --pipeline_path=pipeline-path
Apache Beam:
tfx pipeline create --engine=beam --pipeline_path=pipeline-path
Kubeflow:
tfx pipeline create --engine=kubeflow --pipeline_path=pipeline-path --package_path=package-path \ --iap_client_id=iap-client-id --namespace=namespace --endpoint=endpoint \ --skaffold_cmd=skaffold-cmd
To autodetect engine from user environment, simply avoid using the engine flag like the example below. For more details, check the flags section.
tfx pipeline create --pipeline_path=pipeline-path --endpoint --iap_client_id --namespace \ --package_path --skaffold_cmd
update
Updates an existing pipeline in the given orchestrator.
Usage:
tfx pipeline update --pipeline_path=pipeline-path [--endpoint=endpoint --engine=engine \ --iap_client_id=iap-client-id --namespace=namespace --package_path=package-path \ --skaffold_cmd=skaffold-command]
- --pipeline_path=pipeline-path
- The path to the pipeline configuration file.
- --endpoint=endpoint
-
(Optional.) Endpoint of the Kubeflow Pipelines API service. The endpoint of your Kubeflow Pipelines API service is the same as URL of the Kubeflow Pipelines dashboard. Your endpoint value should be something like:
https://host-name/pipeline
If you do not know the endpoint for your Kubeflow Pipelines cluster, contact you cluster administrator.
If the
--endpoint
is not specified, the in-cluster service DNS name is used as the default value. This name works only if the CLI command executes in a pod on the Kubeflow Pipelines cluster, such as a Kubeflow Jupyter notebooks instance. - --engine=engine
-
(Optional.) The orchestrator to be used for the pipeline. The value of engine must match on of the following values:
- airflow: sets engine to Apache Airflow
- beam: sets engine to Apache Beam
- kubeflow: sets engine to Kubeflow
If the engine is not set, the engine is auto-detected based on the environment.
** Important note: The orchestrator required by the DagRunner in the pipeline config file must match the selected or autodetected engine. Engine auto-detection is based on user environment. If Apache Airflow and Kubeflow Pipelines are not installed, then Apache Beam is used by default.
- --iap_client_id=iap-client-id
- (Optional.) Client ID for IAP protected endpoint.
- --namespace=namespace
-
(Optional.) Kubernetes namespace to connect to the Kubeflow Pipelines API.
If the namespace is not specified, the value defaults to
kubeflow
. - --package_path=package-path
-
(Optional.) Path to the compiled pipeline as a file. The compiled pipeline should be a compressed file (
.tar.gz
,.tgz
, or.zip
) or a YAML file (.yaml
or.yml
).If package-path is not specified, TFX uses the following as the default path:
current_directory/pipeline_name.tar.gz
- --skaffold_cmd=skaffold-cmd
-
(Optional.) The path to Skaffold on your computer.
Examples:
Apache Airflow:
tfx pipeline update --engine=airflow --pipeline_path=pipeline-path
Apache Beam:
tfx pipeline update --engine=beam --pipeline_path=pipeline-path
Kubeflow:
tfx pipeline update --engine=kubeflow --pipeline_path=pipeline-path --package_path=package-path \ --iap_client_id=iap-client-id --namespace=namespace --endpoint=endpoint \ --skaffold_cmd=skaffold-cmd
compile
Compiles the pipeline config file to create a workflow file in Kubeflow and performs the following checks while compiling:
- Checks if the pipeline path is valid.
- Checks if the pipeline details are extracted successfully from the pipeline config file.
- Checks if the DagRunner in the pipeline config matches the engine.
- Checks if the workflow file is created successfully in the package path provided (only for Kubeflow).
Recommended to use before creating or updating a pipeline.
Usage:
tfx pipeline compile --pipeline_path=pipeline-path [--endpoint=endpoint --engine=engine \ --iap_client_id=iap-client-id --namespace=namespace --package_path=package-path]
- --pipeline_path=pipeline-path
- The path to the pipeline configuration file.
- --endpoint=endpoint
-
(Optional.) Endpoint of the Kubeflow Pipelines API service. The endpoint of your Kubeflow Pipelines API service is the same as URL of the Kubeflow Pipelines dashboard. Your endpoint value should be something like:
https://host-name/pipeline
If you do not know the endpoint for your Kubeflow Pipelines cluster, contact you cluster administrator.
If the
--endpoint
is not specified, the in-cluster service DNS name is used as the default value. This name works only if the CLI command executes in a pod on the Kubeflow Pipelines cluster, such as a Kubeflow Jupyter notebooks instance. - --engine=engine
-
(Optional.) The orchestrator to be used for the pipeline. The value of engine must match on of the following values:
- airflow: sets engine to Apache Airflow
- beam: sets engine to Apache Beam
- kubeflow: sets engine to Kubeflow
If the engine is not set, the engine is auto-detected based on the environment.
** Important note: The orchestrator required by the DagRunner in the pipeline config file must match the selected or autodetected engine. Engine auto-detection is based on user environment. If Apache Airflow and Kubeflow Pipelines are not installed, then Apache Beam is used by default.
- --iap_client_id=iap-client-id
- (Optional.) Client ID for IAP protected endpoint.
- --namespace=namespace
-
(Optional.) Kubernetes namespace to connect to the Kubeflow Pipelines API.
If the namespace is not specified, the value defaults to
kubeflow
. - --package_path=package-path
-
(Optional.) Path to the compiled pipeline as a file. The compiled pipeline should be a compressed file (
.tar.gz
,.tgz
, or.zip
) or a YAML file (.yaml
or.yml
).If package-path is not specified, TFX uses the following as the default path:
current_directory/pipeline_name.tar.gz
Examples:
Apache Airflow:
tfx pipeline compile --engine=airflow --pipeline_path=pipeline-path
Apache Beam:
tfx pipeline compile --engine=beam --pipeline_path=pipeline-path
Kubeflow:
tfx pipeline compile --engine=kubeflow --pipeline_path=pipeline-path --package_path=package-path \ --iap_client_id=iap-client-id --namespace=namespace --endpoint=endpoint
delete
Deletes a pipeline from the given orchestrator.
Usage:
tfx pipeline delete --pipeline_path=pipeline-path [--endpoint=endpoint --engine=engine \ --iap_client_id=iap-client-id --namespace=namespace]
- --pipeline_path=pipeline-path
- The path to the pipeline configuration file.
- --endpoint=endpoint
-
(Optional.) Endpoint of the Kubeflow Pipelines API service. The endpoint of your Kubeflow Pipelines API service is the same as URL of the Kubeflow Pipelines dashboard. Your endpoint value should be something like:
https://host-name/pipeline
If you do not know the endpoint for your Kubeflow Pipelines cluster, contact you cluster administrator.
If the
--endpoint
is not specified, the in-cluster service DNS name is used as the default value. This name works only if the CLI command executes in a pod on the Kubeflow Pipelines cluster, such as a Kubeflow Jupyter notebooks instance. - --engine=engine
-
(Optional.) The orchestrator to be used for the pipeline. The value of engine must match on of the following values:
- airflow: sets engine to Apache Airflow
- beam: sets engine to Apache Beam
- kubeflow: sets engine to Kubeflow
If the engine is not set, the engine is auto-detected based on the environment.
** Important note: The orchestrator required by the DagRunner in the pipeline config file must match the selected or autodetected engine. Engine auto-detection is based on user environment. If Apache Airflow and Kubeflow Pipelines are not installed, then Apache Beam is used by default.
- --iap_client_id=iap-client-id
- (Optional.) Client ID for IAP protected endpoint.
- --namespace=namespace
-
(Optional.) Kubernetes namespace to connect to the Kubeflow Pipelines API.
If the namespace is not specified, the value defaults to
kubeflow
.
Examples:
Apache Airflow:
tfx pipeline delete --engine=airflow --pipeline_name=pipeline-name
Apache Beam:
tfx pipeline delete --engine=beam --pipeline_name=pipeline-name
Kubeflow:
tfx pipeline delete --engine=kubeflow --pipeline_name=pipeline-name \ --iap_client_id=iap-client-id --namespace=namespace --endpoint=endpoint
list
Lists all the pipelines in the given orchestrator.
Usage:
tfx pipeline list [--endpoint=endpoint --engine=engine \ --iap_client_id=iap-client-id --namespace=namespace]
- --endpoint=endpoint
-
(Optional.) Endpoint of the Kubeflow Pipelines API service. The endpoint of your Kubeflow Pipelines API service is the same as URL of the Kubeflow Pipelines dashboard. Your endpoint value should be something like:
https://host-name/pipeline
If you do not know the endpoint for your Kubeflow Pipelines cluster, contact you cluster administrator.
If the
--endpoint
is not specified, the in-cluster service DNS name is used as the default value. This name works only if the CLI command executes in a pod on the Kubeflow Pipelines cluster, such as a Kubeflow Jupyter notebooks instance. - --engine=engine
-
(Optional.) The orchestrator to be used for the pipeline. The value of engine must match on of the following values:
- airflow: sets engine to Apache Airflow
- beam: sets engine to Apache Beam
- kubeflow: sets engine to Kubeflow
If the engine is not set, the engine is auto-detected based on the environment.
** Important note: The orchestrator required by the DagRunner in the pipeline config file must match the selected or autodetected engine. Engine auto-detection is based on user environment. If Apache Airflow and Kubeflow Pipelines are not installed, then Apache Beam is used by default.
- --iap_client_id=iap-client-id
- (Optional.) Client ID for IAP protected endpoint.
- --namespace=namespace
-
(Optional.) Kubernetes namespace to connect to the Kubeflow Pipelines API.
If the namespace is not specified, the value defaults to
kubeflow
.
Examples:
Apache Airflow:
tfx pipeline list --engine=airflow
Apache Beam:
tfx pipeline list --engine=beam
Kubeflow:
tfx pipeline list --engine=kubeflow --iap_client_id=iap-client-id \ --namespace=namespace --endpoint=endpoint
tfx run
The structure for commands in the tfx run
command group is as follows:
tfx run command required-flags [optional-flags]
Use the following sections to learn more about the commands in the tfx run
command group.
create
Creates a new run instance for a pipeline in the orchestrator.
Usage:
tfx run create --pipeline_name=pipeline-name [--endpoint=endpoint \ --engine=engine --iap_client_id=iap-client-id --namespace=namespace]
- --pipeline_name=pipeline-name
- The name of the pipeline.
- --endpoint=endpoint
-
(Optional.) Endpoint of the Kubeflow Pipelines API service. The endpoint of your Kubeflow Pipelines API service is the same as URL of the Kubeflow Pipelines dashboard. Your endpoint value should be something like:
https://host-name/pipeline
If you do not know the endpoint for your Kubeflow Pipelines cluster, contact you cluster administrator.
If the
--endpoint
is not specified, the in-cluster service DNS name is used as the default value. This name works only if the CLI command executes in a pod on the Kubeflow Pipelines cluster, such as a Kubeflow Jupyter notebooks instance. - --engine=engine
-
(Optional.) The orchestrator to be used for the pipeline. The value of engine must match on of the following values:
- airflow: sets engine to Apache Airflow
- beam: sets engine to Apache Beam
- kubeflow: sets engine to Kubeflow
If the engine is not set, the engine is auto-detected based on the environment.
** Important note: The orchestrator required by the DagRunner in the pipeline config file must match the selected or autodetected engine. Engine auto-detection is based on user environment. If Apache Airflow and Kubeflow Pipelines are not installed, then Apache Beam is used by default.
- --iap_client_id=iap-client-id
- (Optional.) Client ID for IAP protected endpoint.
- --namespace=namespace
-
(Optional.) Kubernetes namespace to connect to the Kubeflow Pipelines API.
If the namespace is not specified, the value defaults to
kubeflow
.
Examples:
Apache Airflow:
tfx run create --engine=airflow --pipeline_name=pipeline-name
Apache Beam:
tfx run create --engine=beam --pipeline_name=pipeline-name
Kubeflow:
tfx run create --engine=kubeflow --pipeline_name=pipeline-name --iap_client_id=iap-client-id \ --namespace=namespace --endpoint=endpoint
terminate
Stops a run of a given pipeline.
** Important Note: Currently supported only in Kubeflow.
Usage:
tfx run terminate --run_id=run-id [--endpoint=endpoint --engine=engine \ --iap_client_id=iap-client-id --namespace=namespace]
- --run_id=run-id
- Unique identifier for a pipeline run.
- --endpoint=endpoint
-
(Optional.) Endpoint of the Kubeflow Pipelines API service. The endpoint of your Kubeflow Pipelines API service is the same as URL of the Kubeflow Pipelines dashboard. Your endpoint value should be something like:
https://host-name/pipeline
If you do not know the endpoint for your Kubeflow Pipelines cluster, contact you cluster administrator.
If the
--endpoint
is not specified, the in-cluster service DNS name is used as the default value. This name works only if the CLI command executes in a pod on the Kubeflow Pipelines cluster, such as a Kubeflow Jupyter notebooks instance. - --engine=engine
-
(Optional.) The orchestrator to be used for the pipeline. The value of engine must match on of the following values:
- airflow: sets engine to Apache Airflow
- beam: sets engine to Apache Beam
- kubeflow: sets engine to Kubeflow
If the engine is not set, the engine is auto-detected based on the environment.
** Important note: The orchestrator required by the DagRunner in the pipeline config file must match the selected or autodetected engine. Engine auto-detection is based on user environment. If Apache Airflow and Kubeflow Pipelines are not installed, then Apache Beam is used by default.
- --iap_client_id=iap-client-id
- (Optional.) Client ID for IAP protected endpoint.
- --namespace=namespace
-
(Optional.) Kubernetes namespace to connect to the Kubeflow Pipelines API.
If the namespace is not specified, the value defaults to
kubeflow
.
Examples:
Kubeflow:
tfx run delete --engine=kubeflow --run_id=run-id --iap_client_id=iap-client-id \ --namespace=namespace --endpoint=endpoint
list
Lists all runs of a pipeline.
** Important Note: Currently not supported in Apache Beam.
Usage:
tfx run list --pipeline_name=pipeline-name [--endpoint=endpoint \ --engine=engine --iap_client_id=iap-client-id --namespace=namespace]
- --pipeline_name=pipeline-name
- The name of the pipeline.
- --endpoint=endpoint
-
(Optional.) Endpoint of the Kubeflow Pipelines API service. The endpoint of your Kubeflow Pipelines API service is the same as URL of the Kubeflow Pipelines dashboard. Your endpoint value should be something like:
https://host-name/pipeline
If you do not know the endpoint for your Kubeflow Pipelines cluster, contact you cluster administrator.
If the
--endpoint
is not specified, the in-cluster service DNS name is used as the default value. This name works only if the CLI command executes in a pod on the Kubeflow Pipelines cluster, such as a Kubeflow Jupyter notebooks instance. - --engine=engine
-
(Optional.) The orchestrator to be used for the pipeline. The value of engine must match on of the following values:
- airflow: sets engine to Apache Airflow
- beam: sets engine to Apache Beam
- kubeflow: sets engine to Kubeflow
If the engine is not set, the engine is auto-detected based on the environment.
** Important note: The orchestrator required by the DagRunner in the pipeline config file must match the selected or autodetected engine. Engine auto-detection is based on user environment. If Apache Airflow and Kubeflow Pipelines are not installed, then Apache Beam is used by default.
- --iap_client_id=iap-client-id
- (Optional.) Client ID for IAP protected endpoint.
- --namespace=namespace
-
(Optional.) Kubernetes namespace to connect to the Kubeflow Pipelines API.
If the namespace is not specified, the value defaults to
kubeflow
.
Examples:
Apache Airflow:
tfx run list --engine=airflow --pipeline_name=pipeline-name
Kubeflow:
tfx run list --engine=kubeflow --pipeline_name=pipeline-name --iap_client_id=iap-client-id \ --namespace=namespace --endpoint=endpoint
status
Returns the current status of a run.
** Important Note: Currently not supported in Apache Beam.
Usage:
tfx run status --pipeline_name=pipeline-name --run_id=run-id [--endpoint=endpoint \ --engine=engine --iap_client_id=iap-client-id --namespace=namespace]
- --pipeline_name=pipeline-name
- The name of the pipeline.
- --run_id=run-id
- Unique identifier for a pipeline run.
- --endpoint=endpoint
-
(Optional.) Endpoint of the Kubeflow Pipelines API service. The endpoint of your Kubeflow Pipelines API service is the same as URL of the Kubeflow Pipelines dashboard. Your endpoint value should be something like:
https://host-name/pipeline
If you do not know the endpoint for your Kubeflow Pipelines cluster, contact you cluster administrator.
If the
--endpoint
is not specified, the in-cluster service DNS name is used as the default value. This name works only if the CLI command executes in a pod on the Kubeflow Pipelines cluster, such as a Kubeflow Jupyter notebooks instance. - --engine=engine
-
(Optional.) The orchestrator to be used for the pipeline. The value of engine must match on of the following values:
- airflow: sets engine to Apache Airflow
- beam: sets engine to Apache Beam
- kubeflow: sets engine to Kubeflow
If the engine is not set, the engine is auto-detected based on the environment.
** Important note: The orchestrator required by the DagRunner in the pipeline config file must match the selected or autodetected engine. Engine auto-detection is based on user environment. If Apache Airflow and Kubeflow Pipelines are not installed, then Apache Beam is used by default.
- --iap_client_id=iap-client-id
- (Optional.) Client ID for IAP protected endpoint.
- --namespace=namespace
-
(Optional.) Kubernetes namespace to connect to the Kubeflow Pipelines API.
If the namespace is not specified, the value defaults to
kubeflow
.
Examples:
Apache Airflow:
tfx run status --engine=airflow --run_id=run-id --pipeline_name=pipeline-name
Kubeflow:
tfx run status --engine=kubeflow --run_id=run-id --pipeline_name=pipeline-name \ --iap_client_id=iap-client-id --namespace=namespace --endpoint=endpoint
delete
Deletes a run of a given pipeline.
** Important Note: Currently supported only in Kubeflow
Usage:
tfx run delete --run_id=run-id [--engine=engine --iap_client_id=iap-client-id \ --namespace=namespace --endpoint=endpoint]
- --run_id=run-id
- Unique identifier for a pipeline run.
- --endpoint=endpoint
-
(Optional.) Endpoint of the Kubeflow Pipelines API service. The endpoint of your Kubeflow Pipelines API service is the same as URL of the Kubeflow Pipelines dashboard. Your endpoint value should be something like:
https://host-name/pipeline
If you do not know the endpoint for your Kubeflow Pipelines cluster, contact you cluster administrator.
If the
--endpoint
is not specified, the in-cluster service DNS name is used as the default value. This name works only if the CLI command executes in a pod on the Kubeflow Pipelines cluster, such as a Kubeflow Jupyter notebooks instance. - --engine=engine
-
(Optional.) The orchestrator to be used for the pipeline. The value of engine must match on of the following values:
- airflow: sets engine to Apache Airflow
- beam: sets engine to Apache Beam
- kubeflow: sets engine to Kubeflow
If the engine is not set, the engine is auto-detected based on the environment.
** Important note: The orchestrator required by the DagRunner in the pipeline config file must match the selected or autodetected engine. Engine auto-detection is based on user environment. If Apache Airflow and Kubeflow Pipelines are not installed, then Apache Beam is used by default.
- --iap_client_id=iap-client-id
- (Optional.) Client ID for IAP protected endpoint.
- --namespace=namespace
-
(Optional.) Kubernetes namespace to connect to the Kubeflow Pipelines API.
If the namespace is not specified, the value defaults to
kubeflow
.
Examples:
Kubeflow:
tfx run delete --engine=kubeflow --run_id=run-id --iap_client_id=iap-client-id \ --namespace=namespace --endpoint=endpoint
tfx template [Experimental]
The structure for commands in the tfx template
command group is as follows:
tfx template command required-flags [optional-flags]
Use the following sections to learn more about the commands in the tfx
template
command group. Template is an experimental feature and subject to
change at any time.
list
List available TFX pipeline templates.
Usage:
tfx template list
copy
Copy a template to the destination directory.
Usage:
tfx template copy --model=model --pipeline_name=pipeline-name \ --destination_path=destination-path
- --model=model
- The name of the model built by the pipeline template.
- --pipeline_name=pipeline-name
- The name of the pipeline.
- --destination_path=destination-path
- The path to copy the template to.
Understanding TFX CLI Flags
Common flags
- --engine=engine
-
The orchestrator to be used for the pipeline. The value of engine must match on of the following values:
- airflow: sets engine to Apache Airflow
- beam: sets engine to Apache Beam
- kubeflow: sets engine to Kubeflow
If the engine is not set, the engine is auto-detected based on the environment.
** Important note: The orchestrator required by the DagRunner in the pipeline config file must match the selected or autodetected engine. Engine auto-detection is based on user environment. If Apache Airflow and Kubeflow Pipelines are not installed, then Apache Beam is used by default.
- --pipeline_name=pipeline-name
- The name of the pipeline.
- --pipeline_path=pipeline-path
- The path to the pipeline configuration file.
- --run_id=run-id
- Unique identifier for a pipeline run.
Kubeflow specific flags
- --endpoint=endpoint
-
Endpoint of the Kubeflow Pipelines API service. The endpoint of your Kubeflow Pipelines API service is the same as URL of the Kubeflow Pipelines dashboard. Your endpoint value should be something like:
https://host-name/pipeline
If you do not know the endpoint for your Kubeflow Pipelines cluster, contact you cluster administrator.
If the
--endpoint
is not specified, the in-cluster service DNS name is used as the default value. This name works only if the CLI command executes in a pod on the Kubeflow Pipelines cluster, such as a Kubeflow Jupyter notebooks instance. - --iap_client_id=iap-client-id
- Client ID for IAP protected endpoint.
- --namespace=namespace
-
Kubernetes namespace to connect to the Kubeflow Pipelines API. If the
namespace is not specified, the value defaults to
kubeflow
. - --package_path=package-path
-
Path to the compiled pipeline as a file. The compiled pipeline should be a compressed file (
.tar.gz
,.tgz
, or.zip
) or a YAML file (.yaml
or.yml
).If package-path is not specified, TFX uses the following as the default path:
current_directory/pipeline_name.tar.gz
Generated files by TFX CLI
When pipelines are created and run, several files are generated for pipeline management.
- ${HOME}/tfx/local, beam, airflow, kubeflow
- Pipeline metadata read from the configuration is stored under
${HOME}/tfx/${ORCHESTRATION_ENGINE}/${PIPELINE_NAME}
. This location can be customized by setting environment varaible likeAIRFLOW_HOME
orKUBEFLOW_HOME
. This behavior might be changed in future releases. This directory is used to store pipeline information including pipeline ids in the Kubeflow Pipelines cluster which is needed to create runs or update pipelines. - Before TFX 0.25, these files were located under
${HOME}/${ORCHESTRATION_ENGINE}
. In TFX 0.25, files in the old location will be moved to the new location automatically for smooth migration.
- Pipeline metadata read from the configuration is stored under
- (Kubeflow only) Dockerfile, build.yaml, pipeline_name.tar.gz
- Kubeflow Pipelines requires two kinds of input for a pipeline. These files are generated by TFX in the current directory.
- One is a container image which will be used to run components in the
pipeline. This container image is built when a pipeline for Kubeflow
Pipelines is created using TFX CLI. TFX uses
skaffold to build container images.
Dockerfile
andbuild.yaml
is generated by TFX and passed to skaffold.(These file names are fixed and cannot be changed for now.) - TFX CLI compiles given pipeline definitions into a format for Kubeflow
Pipelines understands. The result of compilation is stored as
_pipeline_name_.tar.gz
. This filename can be customized using--package-path
flag.