Skip to content

Commit 895e88b

Browse files
daniel-sanchek8s-ci-robot
authored andcommitted
Mnist pipelines (kubeflow#524)
* added mnist pipelines sample * fixed lint issues
1 parent 7924e0f commit 895e88b

File tree

12 files changed

+461
-0
lines changed

12 files changed

+461
-0
lines changed

pipelines/mnist-pipelines/.gitignore

+2
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
venv
2+
*.tar.gz

pipelines/mnist-pipelines/README.md

+187
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,187 @@
1+
# MNIST Pipelines GCP
2+
3+
This document describes how to run the [MNIST example](https://github.com/kubeflow/examples/tree/master/mnist) on Kubeflow Pipelines on a Google Cloud Platform cluster
4+
5+
## Setup
6+
7+
#### Create a GCS bucket
8+
9+
This pipeline requires a [Google Cloud Storage bucket](https://cloud.google.com/storage/) to hold your trained model. You can create one with the following command
10+
```
11+
BUCKET_NAME=kubeflow-pipeline-demo-$(date +%s)
12+
gsutil mb gs://$BUCKET_NAME/
13+
```
14+
15+
#### Deploy Kubeflow
16+
17+
Follow the [Getting Started Guide](https://www.kubeflow.org/docs/started/getting-started-gke) to deploy a Kubeflow cluster to GKE
18+
19+
#### Open the Kubeflow Pipelines UI
20+
21+
![Kubeflow UI](./img/kubeflow.png "Kubeflow UI")
22+
23+
##### IAP enabled
24+
If you set up your cluster with IAP enabled as described in the [GKE Getting Started guide](https://www.kubeflow.org/docs/started/getting-started-gke),
25+
you can now access the Kubeflow Pipelines UI at `https://<deployment_name>.endpoints.<project>.cloud.goog/pipeline`
26+
27+
##### IAP disabled
28+
If you opted to skip IAP, you can open a connection to the UI using *kubectl port-forward* and browsing to http://localhost:8085/pipeline
29+
30+
```
31+
kubectl port-forward -n kubeflow $(kubectl get pods -n kubeflow --selector=service=ambassador \
32+
-o jsonpath='{.items[0].metadata.name}') 8085:80
33+
```
34+
35+
#### Install Python Dependencies
36+
37+
Set up a [virtual environment](https://docs.python.org/3/tutorial/venv.html) for your Kubeflow Pipelines work:
38+
39+
```
40+
python3 -m venv $(pwd)/venv
41+
source ./venv/bin/activate
42+
```
43+
44+
Install the Kubeflow Pipelines sdk, along with other Python dependencies in the [requirements.txt](./requirements.txt) file
45+
46+
```
47+
pip install -r requirements.txt --upgrade
48+
```
49+
50+
## Running the Pipeline
51+
52+
#### Compile Pipeline
53+
Pipelines are written in Python, but they must be compiled into a [domain-specific language (DSL)](https://en.wikipedia.org/wiki/Domain-specific_language)
54+
before they can be used. Most pipelines are designed so that simply running the script will preform the compilation step
55+
```
56+
python3 mnist-pipeline.py
57+
```
58+
Running this command should produce a compiled *mnist.tar.gz* file
59+
60+
Additionally, you can compile manually using the *dsl-compile* script
61+
62+
```
63+
python venv/bin/dsl-compile --py mnist-pipeline.py --output mnist-pipeline.py.tar.gz
64+
```
65+
66+
#### Upload through the UI
67+
68+
Now that you have the compiled pipelines file, you can upload it through the Kubeflow Pipelines UI.
69+
Simply select the "Upload pipeline" button
70+
71+
![Upload Button](./img/upload_btn.png "Upload Button")
72+
73+
Upload your file and give it a name
74+
75+
![Upload Form](./img/upload_form.png "Upload Form")
76+
77+
#### Run the Pipeline
78+
79+
After clicking on the newly created pipeline, you should be presented with an overview of the pipeline graph.
80+
When you're ready, select the "Create Run" button to launch the pipeline
81+
82+
![Pipeline](./img/pipeline.png "Pipeline")
83+
84+
Fill out the information required for the run, including the GCP `$BUCKET_ID` you created earlier. Press "Start" when you are ready
85+
86+
![Run Form](./img/run_form.png "Run Form")
87+
88+
After clicking on the newly created Run, you should see the pipeline run through the 'train', 'serve', and 'web-ui' components. Click on any component to see its logs.
89+
When the pipeline is complete, look at the logs for the web-ui component to find the IP address created for the MNIST web interface
90+
91+
![Logs](./img/logs.png "Logs")
92+
93+
## Pipeline Breakdown
94+
95+
Now that we've run a pipeline, lets break down how it works
96+
97+
#### Decorator
98+
```
99+
@dsl.pipeline(
100+
name='MNIST',
101+
description='A pipeline to train and serve the MNIST example.'
102+
)
103+
```
104+
Pipelines are expected to include a `@dsl.pipeline` decorator to provide metadata about the pipeline
105+
106+
#### Function Header
107+
```
108+
def mnist_pipeline(model_export_dir='gs://your-bucket/export',
109+
train_steps='200',
110+
learning_rate='0.01',
111+
batch_size='100'):
112+
```
113+
The pipeline is defined in the mnist_pipeline function. It includes a number of arguments, which are exposed in the Kubeflow Pipelines UI when creating a new Run.
114+
Although passed as strings, these arguments are of type [`kfp.dsl.PipelineParam`](https://github.com/kubeflow/pipelines/blob/master/sdk/python/kfp/dsl/_pipeline_param.py)
115+
116+
#### Train
117+
```
118+
train = dsl.ContainerOp(
119+
name='train',
120+
image='gcr.io/kubeflow-examples/mnist/model:v20190304-v0.2-176-g15d997b',
121+
arguments=[
122+
"/opt/model.py",
123+
"--tf-export-dir", model_export_dir,
124+
"--tf-train-steps", train_steps,
125+
"--tf-batch-size", batch_size,
126+
"--tf-learning-rate", learning_rate
127+
]
128+
).apply(gcp.use_gcp_secret('user-gcp-sa'))
129+
```
130+
This block defines the 'train' component. A component is made up of a [`kfp.dsl.ContainerOp`](https://github.com/kubeflow/pipelines/blob/master/sdk/python/kfp/dsl/_container_op.py)
131+
object with the container path and a name specified. The container image used is defined in the [Dockerfile.model in the MNIST example](https://github.com/kubeflow/examples/blob/master/mnist/Dockerfile.model)
132+
133+
Because the training component needs access to our GCS bucket, it is run with access to our 'user-gcp-sa' secret, which gives
134+
read/write access to GCS resources.
135+
After defining the train component, we also set a number of environment variables for the training script
136+
137+
#### Serve
138+
```
139+
serve = dsl.ContainerOp(
140+
name='serve',
141+
image='gcr.io/ml-pipeline/ml-pipeline-kubeflow-deployer:\
142+
7775692adf28d6f79098e76e839986c9ee55dd61',
143+
arguments=[
144+
'--model-export-path', model_export_dir,
145+
'--server-name', "mnist-service"
146+
]
147+
).apply(gcp.use_gcp_secret('user-gcp-sa'))
148+
```
149+
The 'serve' component is slightly different than 'train'. While 'train' runs a single container and then exits, 'serve' runs a container that launches long-living
150+
resources in the cluster. The ContainerOP takes two arguments: the path we exported our trained model to, and a server name. Using these, this pipeline component
151+
creates a Kubeflow [`tf-serving`](https://github.com/kubeflow/kubeflow/tree/master/kubeflow/tf-serving) service within the cluster. This service lives after the
152+
pipeline is complete, and can be seen using `kubectl get all -n kubeflow`. The Dockerfile used to build this container [can be found here](https://github.com/kubeflow/pipelines/blob/master/components/kubeflow/deployer/Dockerfile).
153+
Like the 'train' component, 'serve' requires access to the 'user-gcp-sa' secret for access to the 'kubectl' command within the container.
154+
155+
The `serve.after(train)` line specifies that this component is to run sequentially after 'train' is complete
156+
157+
#### Web UI
158+
```
159+
web_ui = dsl.ContainerOp(
160+
name='web-ui',
161+
image='gcr.io/kubeflow-examples/mnist/deploy-service:latest',
162+
arguments=[
163+
'--image', 'gcr.io/kubeflow-examples/mnist/web-ui:\
164+
v20190304-v0.2-176-g15d997b-pipelines',
165+
'--name', 'web-ui',
166+
'--container-port', '5000',
167+
'--service-port', '80',
168+
'--service-type', "LoadBalancer"
169+
]
170+
).apply(gcp.use_gcp_secret('user-gcp-sa'))
171+
172+
web_ui.after(serve)
173+
```
174+
Like 'serve', the web-ui component launches a service that exists after the pipeline is complete. Instead of launching a Kubeflow resource, the web-ui launches
175+
a standard Kubernetes Deployment/Service pair. The Dockerfile that builds the deployment image [can be found here.](./deploy-service/Dockerfile) This image is used
176+
to deploy the web UI, which was built from the [Dockerfile found in the MNIST example](https://github.com/kubeflow/examples/blob/master/mnist/web-ui/Dockerfile)
177+
178+
After this component is run, a new LoadBalancer is provisioned that gives external access to a 'web-ui' deployment launched in the cluster.
179+
180+
#### Main Function
181+
```
182+
if __name__ == '__main__':
183+
import kfp.compiler as compiler
184+
compiler.Compiler().compile(mnist_pipeline, __file__ + '.tar.gz')
185+
```
186+
187+
At the bottom of the script is a main function. This is used to compile the pipeline when the script is run
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
# Copyright 2018 The Kubeflow Authors
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
FROM debian
16+
17+
RUN apt-get update -q && apt-get upgrade -y && \
18+
apt-get install -y -qq --no-install-recommends \
19+
apt-transport-https \
20+
ca-certificates \
21+
git \
22+
gnupg \
23+
lsb-release \
24+
unzip \
25+
wget && \
26+
wget -O /opt/ks_0.12.0_linux_amd64.tar.gz \
27+
https://github.com/ksonnet/ksonnet/releases/download/v0.12.0/ks_0.12.0_linux_amd64.tar.gz && \
28+
tar -C /opt -xzf /opt/ks_0.12.0_linux_amd64.tar.gz && \
29+
cp /opt/ks_0.12.0_linux_amd64/ks /bin/. && \
30+
rm -f /opt/ks_0.12.0_linux_amd64.tar.gz && \
31+
wget -O /bin/kubectl \
32+
https://storage.googleapis.com/kubernetes-release/release/v1.11.2/bin/linux/amd64/kubectl && \
33+
chmod u+x /bin/kubectl && \
34+
wget -O /opt/kubernetes_v1.11.2 \
35+
https://github.com/kubernetes/kubernetes/archive/v1.11.2.tar.gz && \
36+
mkdir -p /src && \
37+
tar -C /src -xzf /opt/kubernetes_v1.11.2 && \
38+
rm -rf /opt/kubernetes_v1.11.2 && \
39+
wget -O /opt/google-apt-key.gpg \
40+
https://packages.cloud.google.com/apt/doc/apt-key.gpg && \
41+
apt-key add /opt/google-apt-key.gpg && \
42+
export CLOUD_SDK_REPO="cloud-sdk-$(lsb_release -c -s)" && \
43+
echo "deb https://packages.cloud.google.com/apt $CLOUD_SDK_REPO main" >> \
44+
/etc/apt/sources.list.d/google-cloud-sdk.list && \
45+
apt-get update -q && \
46+
apt-get install -y -qq --no-install-recommends google-cloud-sdk && \
47+
gcloud config set component_manager/disable_update_check true
48+
49+
ENV KUBEFLOW_VERSION v0.2.5
50+
51+
# Checkout the kubeflow packages at image build time so that we do not
52+
# require calling in to the GitHub API at run time.
53+
RUN cd /src && \
54+
mkdir -p github.com/kubeflow && \
55+
cd github.com/kubeflow && \
56+
git clone https://github.com/kubeflow/kubeflow && \
57+
cd kubeflow && \
58+
git checkout ${KUBEFLOW_VERSION}
59+
60+
ADD ./src/deploy.sh /bin/.
61+
62+
ENTRYPOINT ["/bin/deploy.sh"]
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,127 @@
1+
#!/bin/bash -e
2+
3+
# Copyright 2018 The Kubeflow Authors
4+
#
5+
# Licensed under the Apache License, Version 2.0 (the "License");
6+
# you may not use this file except in compliance with the License.
7+
# You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing, software
12+
# distributed under the License is distributed on an "AS IS" BASIS,
13+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
# See the License for the specific language governing permissions and
15+
# limitations under the License.
16+
17+
set -x
18+
19+
KUBERNETES_NAMESPACE="${KUBERNETES_NAMESPACE:-kubeflow}"
20+
NAME="my-deployment"
21+
22+
while (($#)); do
23+
case $1 in
24+
"--image")
25+
shift
26+
IMAGE_PATH="$1"
27+
shift
28+
;;
29+
"--service-type")
30+
shift
31+
SERVICE_TYPE="$1"
32+
shift
33+
;;
34+
"--container-port")
35+
shift
36+
CONTAINER_PORT="--containerPort=$1"
37+
shift
38+
;;
39+
"--service-port")
40+
shift
41+
SERVICE_PORT="--servicePort=$1"
42+
shift
43+
;;
44+
"--cluster-name")
45+
shift
46+
CLUSTER_NAME="$1"
47+
shift
48+
;;
49+
"--namespace")
50+
shift
51+
KUBERNETES_NAMESPACE="$1"
52+
shift
53+
;;
54+
"--name")
55+
shift
56+
NAME="$1"
57+
shift
58+
;;
59+
*)
60+
echo "Unknown argument: '$1'"
61+
exit 1
62+
;;
63+
esac
64+
done
65+
66+
if [ -z "${IMAGE_PATH}" ]; then
67+
echo "You must specify an image to deploy"
68+
exit 1
69+
fi
70+
71+
if [ -z "$SERVICE_TYPE" ]; then
72+
SERVICE_TYPE=ClusterIP
73+
fi
74+
75+
echo "Deploying the image '${IMAGE_PATH}'"
76+
77+
if [ -z "${CLUSTER_NAME}" ]; then
78+
CLUSTER_NAME=$(wget -q -O- --header="Metadata-Flavor: Google" http://metadata.google.internal/computeMetadata/v1/instance/attributes/cluster-name)
79+
fi
80+
81+
# Ensure the name is not more than 63 characters.
82+
NAME="${NAME:0:63}"
83+
# Trim any trailing hyphens from the server name.
84+
while [[ "${NAME:(-1)}" == "-" ]]; do NAME="${NAME::-1}"; done
85+
86+
echo "Deploying ${NAME} to the cluster ${CLUSTER_NAME}"
87+
88+
# Connect kubectl to the local cluster
89+
kubectl config set-cluster "${CLUSTER_NAME}" --server=https://kubernetes.default --certificate-authority=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
90+
kubectl config set-credentials pipeline --token "$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)"
91+
kubectl config set-context kubeflow --cluster "${CLUSTER_NAME}" --user pipeline
92+
kubectl config use-context kubeflow
93+
94+
# Configure and deploy the app
95+
cd /src/github.com/kubeflow/kubeflow
96+
git checkout ${KUBEFLOW_VERSION}
97+
98+
cd /opt
99+
echo "Initializing KSonnet app..."
100+
ks init tf-serving-app
101+
cd tf-serving-app/
102+
103+
if [ -n "${KUBERNETES_NAMESPACE}" ]; then
104+
echo "Setting Kubernetes namespace: ${KUBERNETES_NAMESPACE} ..."
105+
ks env set default --namespace "${KUBERNETES_NAMESPACE}"
106+
fi
107+
108+
ks generate deployed-service $NAME --name=$NAME --image=$IMAGE_PATH --type=$SERVICE_TYPE $CONTAINER_PORT $SERVICE_PORT
109+
110+
echo "Deploying the service..."
111+
ks apply default -c $NAME
112+
113+
# Wait for the ip address
114+
timeout="1000"
115+
start_time=`date +%s`
116+
PUBLIC_IP=""
117+
while [ -z "$PUBLIC_IP" ]; do
118+
PUBLIC_IP=$(kubectl get svc -n $KUBERNETES_NAMESPACE $NAME -o jsonpath='{.status.loadBalancer.ingress[0].ip}' 2> /dev/null)
119+
current_time=`date +%s`
120+
elapsed_time=$(expr $current_time + 1 - $start_time)
121+
if [[ $elapsed_time -gt $timeout ]];then
122+
echo "timeout"
123+
exit 1
124+
fi
125+
sleep 5
126+
done
127+
echo "service active: $PUBLIC_IP"
133 KB
Loading
199 KB
Loading
45.5 KB
Loading
79.1 KB
Loading
3.43 KB
Loading
34.3 KB
Loading

0 commit comments

Comments
 (0)