|
| 1 | +# MNIST Pipelines GCP |
| 2 | + |
| 3 | +This document describes how to run the [MNIST example](https://github.com/kubeflow/examples/tree/master/mnist) on Kubeflow Pipelines on a Google Cloud Platform cluster |
| 4 | + |
| 5 | +## Setup |
| 6 | + |
| 7 | +#### Create a GCS bucket |
| 8 | + |
| 9 | +This pipeline requires a [Google Cloud Storage bucket](https://cloud.google.com/storage/) to hold your trained model. You can create one with the following command |
| 10 | +``` |
| 11 | +BUCKET_NAME=kubeflow-pipeline-demo-$(date +%s) |
| 12 | +gsutil mb gs://$BUCKET_NAME/ |
| 13 | +``` |
| 14 | + |
| 15 | +#### Deploy Kubeflow |
| 16 | + |
| 17 | +Follow the [Getting Started Guide](https://www.kubeflow.org/docs/started/getting-started-gke) to deploy a Kubeflow cluster to GKE |
| 18 | + |
| 19 | +#### Open the Kubeflow Pipelines UI |
| 20 | + |
| 21 | + |
| 22 | + |
| 23 | +##### IAP enabled |
| 24 | +If you set up your cluster with IAP enabled as described in the [GKE Getting Started guide](https://www.kubeflow.org/docs/started/getting-started-gke), |
| 25 | +you can now access the Kubeflow Pipelines UI at `https://<deployment_name>.endpoints.<project>.cloud.goog/pipeline` |
| 26 | + |
| 27 | +##### IAP disabled |
| 28 | +If you opted to skip IAP, you can open a connection to the UI using *kubectl port-forward* and browsing to http://localhost:8085/pipeline |
| 29 | + |
| 30 | +``` |
| 31 | +kubectl port-forward -n kubeflow $(kubectl get pods -n kubeflow --selector=service=ambassador \ |
| 32 | + -o jsonpath='{.items[0].metadata.name}') 8085:80 |
| 33 | +``` |
| 34 | + |
| 35 | +#### Install Python Dependencies |
| 36 | + |
| 37 | +Set up a [virtual environment](https://docs.python.org/3/tutorial/venv.html) for your Kubeflow Pipelines work: |
| 38 | + |
| 39 | +``` |
| 40 | +python3 -m venv $(pwd)/venv |
| 41 | +source ./venv/bin/activate |
| 42 | +``` |
| 43 | + |
| 44 | +Install the Kubeflow Pipelines sdk, along with other Python dependencies in the [requirements.txt](./requirements.txt) file |
| 45 | + |
| 46 | +``` |
| 47 | +pip install -r requirements.txt --upgrade |
| 48 | +``` |
| 49 | + |
| 50 | +## Running the Pipeline |
| 51 | + |
| 52 | +#### Compile Pipeline |
| 53 | +Pipelines are written in Python, but they must be compiled into a [domain-specific language (DSL)](https://en.wikipedia.org/wiki/Domain-specific_language) |
| 54 | +before they can be used. Most pipelines are designed so that simply running the script will preform the compilation step |
| 55 | +``` |
| 56 | +python3 mnist-pipeline.py |
| 57 | +``` |
| 58 | +Running this command should produce a compiled *mnist.tar.gz* file |
| 59 | + |
| 60 | +Additionally, you can compile manually using the *dsl-compile* script |
| 61 | + |
| 62 | +``` |
| 63 | +python venv/bin/dsl-compile --py mnist-pipeline.py --output mnist-pipeline.py.tar.gz |
| 64 | +``` |
| 65 | + |
| 66 | +#### Upload through the UI |
| 67 | + |
| 68 | +Now that you have the compiled pipelines file, you can upload it through the Kubeflow Pipelines UI. |
| 69 | +Simply select the "Upload pipeline" button |
| 70 | + |
| 71 | + |
| 72 | + |
| 73 | +Upload your file and give it a name |
| 74 | + |
| 75 | + |
| 76 | + |
| 77 | +#### Run the Pipeline |
| 78 | + |
| 79 | +After clicking on the newly created pipeline, you should be presented with an overview of the pipeline graph. |
| 80 | +When you're ready, select the "Create Run" button to launch the pipeline |
| 81 | + |
| 82 | + |
| 83 | + |
| 84 | +Fill out the information required for the run, including the GCP `$BUCKET_ID` you created earlier. Press "Start" when you are ready |
| 85 | + |
| 86 | + |
| 87 | + |
| 88 | +After clicking on the newly created Run, you should see the pipeline run through the 'train', 'serve', and 'web-ui' components. Click on any component to see its logs. |
| 89 | +When the pipeline is complete, look at the logs for the web-ui component to find the IP address created for the MNIST web interface |
| 90 | + |
| 91 | + |
| 92 | + |
| 93 | +## Pipeline Breakdown |
| 94 | + |
| 95 | +Now that we've run a pipeline, lets break down how it works |
| 96 | + |
| 97 | +#### Decorator |
| 98 | +``` |
| 99 | +@dsl.pipeline( |
| 100 | + name='MNIST', |
| 101 | + description='A pipeline to train and serve the MNIST example.' |
| 102 | +) |
| 103 | +``` |
| 104 | +Pipelines are expected to include a `@dsl.pipeline` decorator to provide metadata about the pipeline |
| 105 | + |
| 106 | +#### Function Header |
| 107 | +``` |
| 108 | +def mnist_pipeline(model_export_dir='gs://your-bucket/export', |
| 109 | + train_steps='200', |
| 110 | + learning_rate='0.01', |
| 111 | + batch_size='100'): |
| 112 | +``` |
| 113 | +The pipeline is defined in the mnist_pipeline function. It includes a number of arguments, which are exposed in the Kubeflow Pipelines UI when creating a new Run. |
| 114 | +Although passed as strings, these arguments are of type [`kfp.dsl.PipelineParam`](https://github.com/kubeflow/pipelines/blob/master/sdk/python/kfp/dsl/_pipeline_param.py) |
| 115 | + |
| 116 | +#### Train |
| 117 | +``` |
| 118 | +train = dsl.ContainerOp( |
| 119 | + name='train', |
| 120 | + image='gcr.io/kubeflow-examples/mnist/model:v20190304-v0.2-176-g15d997b', |
| 121 | + arguments=[ |
| 122 | + "/opt/model.py", |
| 123 | + "--tf-export-dir", model_export_dir, |
| 124 | + "--tf-train-steps", train_steps, |
| 125 | + "--tf-batch-size", batch_size, |
| 126 | + "--tf-learning-rate", learning_rate |
| 127 | + ] |
| 128 | +).apply(gcp.use_gcp_secret('user-gcp-sa')) |
| 129 | +``` |
| 130 | +This block defines the 'train' component. A component is made up of a [`kfp.dsl.ContainerOp`](https://github.com/kubeflow/pipelines/blob/master/sdk/python/kfp/dsl/_container_op.py) |
| 131 | +object with the container path and a name specified. The container image used is defined in the [Dockerfile.model in the MNIST example](https://github.com/kubeflow/examples/blob/master/mnist/Dockerfile.model) |
| 132 | + |
| 133 | +Because the training component needs access to our GCS bucket, it is run with access to our 'user-gcp-sa' secret, which gives |
| 134 | +read/write access to GCS resources. |
| 135 | +After defining the train component, we also set a number of environment variables for the training script |
| 136 | + |
| 137 | +#### Serve |
| 138 | +``` |
| 139 | +serve = dsl.ContainerOp( |
| 140 | + name='serve', |
| 141 | + image='gcr.io/ml-pipeline/ml-pipeline-kubeflow-deployer:\ |
| 142 | + 7775692adf28d6f79098e76e839986c9ee55dd61', |
| 143 | + arguments=[ |
| 144 | + '--model-export-path', model_export_dir, |
| 145 | + '--server-name', "mnist-service" |
| 146 | + ] |
| 147 | +).apply(gcp.use_gcp_secret('user-gcp-sa')) |
| 148 | +``` |
| 149 | +The 'serve' component is slightly different than 'train'. While 'train' runs a single container and then exits, 'serve' runs a container that launches long-living |
| 150 | +resources in the cluster. The ContainerOP takes two arguments: the path we exported our trained model to, and a server name. Using these, this pipeline component |
| 151 | +creates a Kubeflow [`tf-serving`](https://github.com/kubeflow/kubeflow/tree/master/kubeflow/tf-serving) service within the cluster. This service lives after the |
| 152 | +pipeline is complete, and can be seen using `kubectl get all -n kubeflow`. The Dockerfile used to build this container [can be found here](https://github.com/kubeflow/pipelines/blob/master/components/kubeflow/deployer/Dockerfile). |
| 153 | +Like the 'train' component, 'serve' requires access to the 'user-gcp-sa' secret for access to the 'kubectl' command within the container. |
| 154 | + |
| 155 | +The `serve.after(train)` line specifies that this component is to run sequentially after 'train' is complete |
| 156 | + |
| 157 | +#### Web UI |
| 158 | +``` |
| 159 | +web_ui = dsl.ContainerOp( |
| 160 | + name='web-ui', |
| 161 | + image='gcr.io/kubeflow-examples/mnist/deploy-service:latest', |
| 162 | + arguments=[ |
| 163 | + '--image', 'gcr.io/kubeflow-examples/mnist/web-ui:\ |
| 164 | + v20190304-v0.2-176-g15d997b-pipelines', |
| 165 | + '--name', 'web-ui', |
| 166 | + '--container-port', '5000', |
| 167 | + '--service-port', '80', |
| 168 | + '--service-type', "LoadBalancer" |
| 169 | + ] |
| 170 | +).apply(gcp.use_gcp_secret('user-gcp-sa')) |
| 171 | +
|
| 172 | +web_ui.after(serve) |
| 173 | +``` |
| 174 | +Like 'serve', the web-ui component launches a service that exists after the pipeline is complete. Instead of launching a Kubeflow resource, the web-ui launches |
| 175 | +a standard Kubernetes Deployment/Service pair. The Dockerfile that builds the deployment image [can be found here.](./deploy-service/Dockerfile) This image is used |
| 176 | +to deploy the web UI, which was built from the [Dockerfile found in the MNIST example](https://github.com/kubeflow/examples/blob/master/mnist/web-ui/Dockerfile) |
| 177 | + |
| 178 | +After this component is run, a new LoadBalancer is provisioned that gives external access to a 'web-ui' deployment launched in the cluster. |
| 179 | + |
| 180 | +#### Main Function |
| 181 | +``` |
| 182 | +if __name__ == '__main__': |
| 183 | + import kfp.compiler as compiler |
| 184 | + compiler.Compiler().compile(mnist_pipeline, __file__ + '.tar.gz') |
| 185 | +``` |
| 186 | + |
| 187 | +At the bottom of the script is a main function. This is used to compile the pipeline when the script is run |
0 commit comments