diff --git a/README.md b/README.md
index c4b19974..1c4bfb59 100644
--- a/README.md
+++ b/README.md
@@ -57,9 +57,8 @@ A Framework represents an application with a set of Tasks:
1. A Kubernetes cluster, v1.10 or above, on-cloud or on-premise.
## Quick Start
-1. [Build](build/frameworkcontroller)
-2. [Run Example](example/run/frameworkcontroller.md)
-3. [Framework Example](example/framework)
+1. [Run Controller](example/run)
+2. [Submit Framework](example/framework)
## Doc
1. [User Manual](doc/user-manual.md)
diff --git a/doc/user-manual.md b/doc/user-manual.md
index 84862e96..92996d0a 100644
--- a/doc/user-manual.md
+++ b/doc/user-manual.md
@@ -10,26 +10,37 @@
- [Best Practice](#BestPractice)
## Framework Interop
-**Supported interoperations with a Framework**
-
+### Supported Client
+As Framework is actually a [Kubernetes CRD](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/#customresourcedefinitions), all [CRD Clients](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/#accessing-a-custom-resource) can be used to interoperate with it, such as:
+1. [kubectl](https://kubernetes.io/docs/reference/kubectl)
+ ```shell
+ kubectl create -f {Framework File Path}
+ # Note this is not Foreground Deletion, see [DELETE Framework] section
+ kubectl delete framework {FrameworkName}
+ kubectl get framework {FrameworkName}
+ kubectl describe framework {FrameworkName}
+ kubectl get frameworks
+ kubectl describe frameworks
+ ...
+ ```
+2. [Kubernetes Client Library](https://kubernetes.io/docs/reference/using-api/client-libraries)
+3. Any HTTP Client
+
+### Supported Interoperation
| API Kind | Operations |
|:---- |:---- |
| Framework | [CREATE](#CREATE_Framework) [DELETE](#DELETE_Framework) [GET](#GET_Framework) [LIST](#LIST_Frameworks) [WATCH](#WATCH_Framework) [WATCH_LIST](#WATCH_LIST_Frameworks) |
| [ConfigMap](https://v1-10.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.10/#configmap-v1-core) | All operations except for [CREATE](https://v1-10.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.10/#create-193) [PUT](https://v1-10.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.10/#replace-195) [PATCH](https://v1-10.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.10/#patch-194) |
| [Pod](https://v1-10.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.10/#pod-v1-core) | All operations except for [CREATE](https://v1-10.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.10/#create-55) [PUT](https://v1-10.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.10/#replace-57) [PATCH](https://v1-10.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.10/#patch-56) |
-**Supported clients to execute the interoperations with a Framework**
-
-As Framework is actually a Kubernetes [CRD](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/#customresourcedefinitions), all CRD clients can be used to execute the interoperations with a Framework, see them in [Accessing a custom resource](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/#accessing-a-custom-resource).
-
-### CREATE Framework
+#### CREATE Framework
**Request**
POST /apis/frameworkcontroller.microsoft.com/v1/namespaces/{FrameworkNamespace}/frameworks
Body: [Framework](../pkg/apis/frameworkcontroller/v1/types.go)
-Type: application/json
+Type: application/json or application/yaml
**Description**
@@ -44,26 +55,32 @@ Create the specified Framework.
| Accepted(202) | [Framework](../pkg/apis/frameworkcontroller/v1/types.go) | Return current Framework. |
| Conflict(409) | [Status](https://v1-10.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.10/#status-v1-meta) | The specified Framework already exists. |
-### DELETE Framework
+#### DELETE Framework
**Request**
DELETE /apis/frameworkcontroller.microsoft.com/v1/namespaces/{FrameworkNamespace}/frameworks/{FrameworkName}
Body:
+
+application/json
```json
{
"propagationPolicy": "Foreground"
}
```
+application/yaml
+```yaml
+propagationPolicy: Foreground
+```
-Type: application/json
+Type: application/json or application/yaml
**Description**
Delete the specified Framework.
Notes:
-* Should always use and only use the provided body, see [Framework Notes](../pkg/apis/frameworkcontroller/v1/types.go).
+* If you need to ensure at most one instance of a specific Framework (identified by the FrameworkName) is running at any point in time, you should always use and only use the [Foreground Deletion](https://kubernetes.io/docs/concepts/workloads/controllers/garbage-collection/#foreground-cascading-deletion) in the provided body, see [Framework Notes](../pkg/apis/frameworkcontroller/v1/types.go). However, `kubectl delete` does not support to specify the Foreground Deletion at least for [Kubernetes v1.10](https://github.com/kubernetes/kubernetes/issues/66110#issuecomment-413761559), so you may have to use other [Supported Client](#SupportedClient).
**Response**
@@ -73,7 +90,7 @@ Notes:
| OK(200) | [Status](https://v1-10.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.10/#status-v1-meta) | The specified Framework is deleted. |
| NotFound(200) | [Status](https://v1-10.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.10/#status-v1-meta) | The specified Framework is not found. |
-### GET Framework
+#### GET Framework
**Request**
GET /apis/frameworkcontroller.microsoft.com/v1/namespaces/{FrameworkNamespace}/frameworks/{FrameworkName}
@@ -89,7 +106,7 @@ Get the specified Framework.
| OK(200) | [Framework](../pkg/apis/frameworkcontroller/v1/types.go) | Return current Framework. |
| NotFound(200) | [Status](https://v1-10.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.10/#status-v1-meta) | The specified Framework is not found. |
-### LIST Frameworks
+#### LIST Frameworks
**Request**
GET /apis/frameworkcontroller.microsoft.com/v1/namespaces/{FrameworkNamespace}/frameworks
@@ -107,7 +124,7 @@ Get all Frameworks (in the specified FrameworkNamespace).
|:---- |:---- |:---- |
| OK(200) | [FrameworkList](../pkg/apis/frameworkcontroller/v1/types.go) | Return all Frameworks (in the specified FrameworkNamespace). |
-### WATCH Framework
+#### WATCH Framework
**Request**
GET /apis/frameworkcontroller.microsoft.com/v1/watch/namespaces/{FrameworkNamespace}/frameworks/{FrameworkName}
@@ -125,7 +142,7 @@ Watch the change events of the specified Framework.
| OK(200) | [WatchEvent](https://v1-10.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.10/#watchevent-v1-meta) | Streaming the change events of the specified Framework. |
| NotFound(200) | [Status](https://v1-10.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.10/#status-v1-meta) | The specified Framework is not found. |
-### WATCH_LIST Frameworks
+#### WATCH_LIST Frameworks
**Request**
GET /apis/frameworkcontroller.microsoft.com/v1/watch/namespaces/{FrameworkNamespace}/frameworks
@@ -305,8 +322,7 @@ Notes:
## Controller Extension
### FrameworkBarrier
1. [Usage](../pkg/barrier/barrier.go)
-2. [Build](../build/frameworkbarrier)
-3. Example: [FrameworkBarrier Example](../example/framework/extension/frameworkbarrier.yaml), [Tensorflow Example](../example/framework/scenario/tensorflow), [etc](../example/framework/scenario).
+2. Example: [FrameworkBarrier Example](../example/framework/extension/frameworkbarrier.yaml), [TensorFlow Example](../example/framework/scenario/tensorflow), [etc](../example/framework/scenario).
## Best Practice
[Best Practice](../pkg/apis/frameworkcontroller/v1/types.go)
diff --git a/example/config/default/frameworkcontroller.yaml b/example/config/default/frameworkcontroller.yaml
index b973cc8e..9ed47f06 100644
--- a/example/config/default/frameworkcontroller.yaml
+++ b/example/config/default/frameworkcontroller.yaml
@@ -3,16 +3,6 @@
# This is the default config for frameworkcontroller, so all settings are commented out.
-# Setup k8s config:
-# kubeApiServerAddress is default to ${KUBE_APISERVER_ADDRESS} and kubeConfigFilePath
-# is default to ${KUBECONFIG} then falls back to ${HOME}/.kube/config.
-# If both kubeApiServerAddress and kubeConfigFilePath after defaulting are still empty,
-# falls back to k8s inClusterConfig.
-#
-# Address should be in format http[s]://host:port
#kubeApiServerAddress: http://10.10.10.10:8080
-#
-# See https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#config
#kubeConfigFilePath: ""
-
#workerNumber: 20
diff --git a/example/framework/README.md b/example/framework/README.md
new file mode 100644
index 00000000..5d62c660
--- /dev/null
+++ b/example/framework/README.md
@@ -0,0 +1,11 @@
+# Submit Framework
+We provide various Framework examples that can be submitted by various clients:
+1. [Framework Supported Client](../../doc/user-manual.md#SupportedClient)
+2. Framework Example
+ 1. [Basic Example](basic)
+ 2. [FrameworkController Extension Example](extension)
+ 3. [Real Scenario Example](scenario)
+
+## Next
+1. [Framework Interop](../../doc/user-manual.md#FrameworkInterop)
+2. [Framework Usage](../../pkg/apis/frameworkcontroller/v1/types.go)
diff --git a/example/framework/basic/batchfailedpermanent.yaml b/example/framework/basic/batchfailedpermanent.yaml
index 5d24e25a..c89900a1 100644
--- a/example/framework/basic/batchfailedpermanent.yaml
+++ b/example/framework/basic/batchfailedpermanent.yaml
@@ -1,4 +1,3 @@
-# Post to {kubeApiServerAddress}/apis/frameworkcontroller.microsoft.com/v1/namespaces/default/frameworks
# For the full spec setting and usage, see ./pkg/apis/frameworkcontroller/v1/types.go
apiVersion: frameworkcontroller.microsoft.com/v1
kind: Framework
diff --git a/example/framework/basic/batchfailedtransient.yaml b/example/framework/basic/batchfailedtransient.yaml
index 62c43125..216db204 100644
--- a/example/framework/basic/batchfailedtransient.yaml
+++ b/example/framework/basic/batchfailedtransient.yaml
@@ -1,4 +1,3 @@
-# Post to {kubeApiServerAddress}/apis/frameworkcontroller.microsoft.com/v1/namespaces/default/frameworks
# For the full spec setting and usage, see ./pkg/apis/frameworkcontroller/v1/types.go
apiVersion: frameworkcontroller.microsoft.com/v1
kind: Framework
diff --git a/example/framework/basic/batchfailedtransientconflict.yaml b/example/framework/basic/batchfailedtransientconflict.yaml
index 22e81ce2..0e9a1763 100644
--- a/example/framework/basic/batchfailedtransientconflict.yaml
+++ b/example/framework/basic/batchfailedtransientconflict.yaml
@@ -1,4 +1,3 @@
-# Post to {kubeApiServerAddress}/apis/frameworkcontroller.microsoft.com/v1/namespaces/default/frameworks
# For the full spec setting and usage, see ./pkg/apis/frameworkcontroller/v1/types.go
apiVersion: frameworkcontroller.microsoft.com/v1
kind: Framework
diff --git a/example/framework/basic/batchfailedunknown.yaml b/example/framework/basic/batchfailedunknown.yaml
index f40e9a28..4afe6715 100644
--- a/example/framework/basic/batchfailedunknown.yaml
+++ b/example/framework/basic/batchfailedunknown.yaml
@@ -1,4 +1,3 @@
-# Post to {kubeApiServerAddress}/apis/frameworkcontroller.microsoft.com/v1/namespaces/default/frameworks
# For the full spec setting and usage, see ./pkg/apis/frameworkcontroller/v1/types.go
apiVersion: frameworkcontroller.microsoft.com/v1
kind: Framework
diff --git a/example/framework/basic/batchstatefulfailed.yaml b/example/framework/basic/batchstatefulfailed.yaml
index c7c1bab7..58a75720 100644
--- a/example/framework/basic/batchstatefulfailed.yaml
+++ b/example/framework/basic/batchstatefulfailed.yaml
@@ -1,4 +1,3 @@
-# Post to {kubeApiServerAddress}/apis/frameworkcontroller.microsoft.com/v1/namespaces/default/frameworks
# For the full spec setting and usage, see ./pkg/apis/frameworkcontroller/v1/types.go
apiVersion: frameworkcontroller.microsoft.com/v1
kind: Framework
diff --git a/example/framework/basic/batchsucceeded.yaml b/example/framework/basic/batchsucceeded.yaml
index c30db1a5..523d31c4 100644
--- a/example/framework/basic/batchsucceeded.yaml
+++ b/example/framework/basic/batchsucceeded.yaml
@@ -1,4 +1,3 @@
-# Post to {kubeApiServerAddress}/apis/frameworkcontroller.microsoft.com/v1/namespaces/default/frameworks
# For the full spec setting and usage, see ./pkg/apis/frameworkcontroller/v1/types.go
apiVersion: frameworkcontroller.microsoft.com/v1
kind: Framework
diff --git a/example/framework/basic/batchwithservicesucceeded.yaml b/example/framework/basic/batchwithservicesucceeded.yaml
index e4a07321..3fc0aedb 100644
--- a/example/framework/basic/batchwithservicesucceeded.yaml
+++ b/example/framework/basic/batchwithservicesucceeded.yaml
@@ -1,4 +1,3 @@
-# Post to {kubeApiServerAddress}/apis/frameworkcontroller.microsoft.com/v1/namespaces/default/frameworks
# For the full spec setting and usage, see ./pkg/apis/frameworkcontroller/v1/types.go
apiVersion: frameworkcontroller.microsoft.com/v1
kind: Framework
diff --git a/example/framework/basic/service.yaml b/example/framework/basic/service.yaml
index 8b6fd85d..ac66660d 100644
--- a/example/framework/basic/service.yaml
+++ b/example/framework/basic/service.yaml
@@ -1,4 +1,3 @@
-# Post to {kubeApiServerAddress}/apis/frameworkcontroller.microsoft.com/v1/namespaces/default/frameworks
# For the full spec setting and usage, see ./pkg/apis/frameworkcontroller/v1/types.go
apiVersion: frameworkcontroller.microsoft.com/v1
kind: Framework
diff --git a/example/framework/basic/servicestateful.yaml b/example/framework/basic/servicestateful.yaml
index 5d6b1b24..b883d297 100644
--- a/example/framework/basic/servicestateful.yaml
+++ b/example/framework/basic/servicestateful.yaml
@@ -1,4 +1,3 @@
-# Post to {kubeApiServerAddress}/apis/frameworkcontroller.microsoft.com/v1/namespaces/default/frameworks
# For the full spec setting and usage, see ./pkg/apis/frameworkcontroller/v1/types.go
apiVersion: frameworkcontroller.microsoft.com/v1
kind: Framework
diff --git a/example/framework/extension/frameworkbarrier.yaml b/example/framework/extension/frameworkbarrier.yaml
index aad0f6cf..a4980e88 100644
--- a/example/framework/extension/frameworkbarrier.yaml
+++ b/example/framework/extension/frameworkbarrier.yaml
@@ -1,6 +1,9 @@
-# Post to {kubeApiServerAddress}/apis/frameworkcontroller.microsoft.com/v1/namespaces/default/frameworks
# For the full spec setting and usage, see ./pkg/apis/frameworkcontroller/v1/types.go
# For the full frameworkbarrier usage, see ./pkg/barrier/barrier.go
+
+############################### Prerequisite ###################################
+# See "[PREREQUISITE]" in this file.
+################################################################################
apiVersion: frameworkcontroller.microsoft.com/v1
kind: Framework
metadata:
@@ -54,6 +57,19 @@ spec:
volumeMounts:
- name: frameworkbarrier-volume
mountPath: /mnt/frameworkbarrier
+ # [PREREQUISITE]
+ # User needs to create a service account in the same namespace of this
+ # Framework with granted permission for frameworkbarrier, if the k8s
+ # cluster enforces authorization.
+ # For example, if the cluster enforces RBAC:
+ # kubectl create serviceaccount frameworkbarrier --namespace default
+ # kubectl create clusterrole frameworkbarrier \
+ # --verb=get,list,watch \
+ # --resource=frameworks
+ # kubectl create clusterrolebinding frameworkbarrier \
+ # --clusterrole=frameworkbarrier \
+ # --user=system:serviceaccount:default:frameworkbarrier
+ serviceAccountName: frameworkbarrier
initContainers:
- name: frameworkbarrier
# Using official image to demonstrate this example.
@@ -97,6 +113,9 @@ spec:
volumeMounts:
- name: frameworkbarrier-volume
mountPath: /mnt/frameworkbarrier
+ # [PREREQUISITE]
+ # Same as server TaskRole.
+ serviceAccountName: frameworkbarrier
initContainers:
- name: frameworkbarrier
image: frameworkcontroller/frameworkbarrier
diff --git a/example/framework/scenario/tensorflow/README.md b/example/framework/scenario/tensorflow/README.md
new file mode 100644
index 00000000..8f0a6858
--- /dev/null
+++ b/example/framework/scenario/tensorflow/README.md
@@ -0,0 +1,17 @@
+# TensorFlow On FrameworkController
+
+## Feature
+1. Support both GPU and CPU Distributed Training
+2. Automatically clean up PS when the whole FrameworkAttempt is completed
+3. No need to adjust existing TensorFlow image
+4. No need to setup [Kubernetes DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service) and [Kubernetes Service](https://kubernetes.io/docs/concepts/services-networking/service)
+5. [Common Feature](../../../../README.md#Feature)
+
+## Prerequisite
+1. See `[PREREQUISITE]` in each specific Framework yaml file.
+2. Need to setup [Kubernetes Cluster-Level Logging](https://kubernetes.io/docs/concepts/cluster-administration/logging), if you need to persist and expose the log for deleted Pod.
+
+## Quick Start
+1. [Common Quick Start](../../../../README.md#Quick-Start)
+2. [CPU Example](cpu)
+3. [GPU Example](gpu)
diff --git a/example/framework/scenario/tensorflow/cpu/tensorflowdistributedtrainingwithcpu.yaml b/example/framework/scenario/tensorflow/cpu/tensorflowdistributedtrainingwithcpu.yaml
index 7485f13c..9ec33811 100644
--- a/example/framework/scenario/tensorflow/cpu/tensorflowdistributedtrainingwithcpu.yaml
+++ b/example/framework/scenario/tensorflow/cpu/tensorflowdistributedtrainingwithcpu.yaml
@@ -1,6 +1,9 @@
-# Post to {kubeApiServerAddress}/apis/frameworkcontroller.microsoft.com/v1/namespaces/default/frameworks
# For the full spec setting and usage, see ./pkg/apis/frameworkcontroller/v1/types.go
# For the full frameworkbarrier usage, see ./pkg/barrier/barrier.go
+
+############################### Prerequisite ###################################
+# See "[PREREQUISITE]" in this file.
+################################################################################
apiVersion: frameworkcontroller.microsoft.com/v1
kind: Framework
metadata:
@@ -23,8 +26,15 @@ spec:
pod:
spec:
restartPolicy: Never
- # Using hostNetwork to avoid network overhead.
- hostNetwork: true
+ # [PREREQUISITE]
+ # User needs to setup the k8s cluster networking model and aware the
+ # potential network overhead, if he want to disable the hostNetwork to
+ # avoid the coordination of the containerPort usage.
+ # And for this example, if the hostNetwork is disabled, it only needs
+ # at least 1 node, otherwise, it needs at least 3 nodes since all the
+ # 3 workers are specified with the same containerPort.
+ # See https://kubernetes.io/docs/concepts/cluster-administration/networking
+ hostNetwork: false
containers:
- name: tensorflow
# Using official image to demonstrate this example.
@@ -56,6 +66,11 @@ spec:
mountPath: /mnt/frameworkbarrier
- name: data-volume
mountPath: /mnt/data
+ # [PREREQUISITE]
+ # User needs to create a service account for frameworkbarrier, if the
+ # k8s cluster enforces authorization.
+ # See more in ./example/framework/extension/frameworkbarrier.yaml
+ serviceAccountName: frameworkbarrier
initContainers:
- name: frameworkbarrier
# Using official image to demonstrate this example.
@@ -74,10 +89,12 @@ spec:
- name: frameworkbarrier-volume
emptyDir: {}
- name: data-volume
+ # [PREREQUISITE]
# User needs to specify his own data-volume for input data and
- # output model and the data-volume must be a distributed shared
- # file system, so that data can be "handed off" between Pods,
- # such as nfs, cephfs or glusterfs, etc.
+ # output model.
+ # The data-volume must be a distributed shared file system, so that
+ # data can be "handed off" between Pods, such as nfs, cephfs or
+ # glusterfs, etc.
# See https://kubernetes.io/docs/concepts/storage/volumes.
#
# And then he needs to download and extract the example input data
@@ -103,7 +120,9 @@ spec:
pod:
spec:
restartPolicy: Never
- hostNetwork: true
+ # [PREREQUISITE]
+ # Same as ps TaskRole.
+ hostNetwork: false
containers:
- name: tensorflow
image: frameworkcontroller/tensorflow-examples:cpu
@@ -125,6 +144,9 @@ spec:
mountPath: /mnt/frameworkbarrier
- name: data-volume
mountPath: /mnt/data
+ # [PREREQUISITE]
+ # Same as ps TaskRole.
+ serviceAccountName: frameworkbarrier
initContainers:
- name: frameworkbarrier
image: frameworkcontroller/frameworkbarrier
@@ -140,6 +162,8 @@ spec:
- name: frameworkbarrier-volume
emptyDir: {}
- name: data-volume
+ # [PREREQUISITE]
+ # Same as ps TaskRole.
#nfs:
# server: {NFS Server Host}
# path: {NFS Shared Directory}
diff --git a/example/framework/scenario/tensorflow/gpu/tensorflowdistributedtrainingwithgpu.yaml b/example/framework/scenario/tensorflow/gpu/tensorflowdistributedtrainingwithgpu.yaml
index 1a1e39a4..1d6bdde2 100644
--- a/example/framework/scenario/tensorflow/gpu/tensorflowdistributedtrainingwithgpu.yaml
+++ b/example/framework/scenario/tensorflow/gpu/tensorflowdistributedtrainingwithgpu.yaml
@@ -1,6 +1,9 @@
-# Post to {kubeApiServerAddress}/apis/frameworkcontroller.microsoft.com/v1/namespaces/default/frameworks
# For the full spec setting and usage, see ./pkg/apis/frameworkcontroller/v1/types.go
# For the full frameworkbarrier usage, see ./pkg/barrier/barrier.go
+
+############################### Prerequisite ###################################
+# See "[PREREQUISITE]" in this file.
+################################################################################
apiVersion: frameworkcontroller.microsoft.com/v1
kind: Framework
metadata:
@@ -23,8 +26,15 @@ spec:
pod:
spec:
restartPolicy: Never
- # Using hostNetwork to avoid network overhead.
- hostNetwork: true
+ # [PREREQUISITE]
+ # User needs to setup the k8s cluster networking model and aware the
+ # potential network overhead, if he want to disable the hostNetwork to
+ # avoid the coordination of the containerPort usage.
+ # And for this example, if the hostNetwork is disabled, it only needs
+ # at least 1 node, otherwise, it needs at least 3 nodes since all the
+ # 3 workers are specified with the same containerPort.
+ # See https://kubernetes.io/docs/concepts/cluster-administration/networking
+ hostNetwork: false
containers:
- name: tensorflow
# Using official image to demonstrate this example.
@@ -53,6 +63,7 @@ spec:
- containerPort: 4001
resources:
limits:
+ # [PREREQUISITE]
# User needs to setup GPU for the k8s cluster.
# See https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus
nvidia.com/gpu: 1
@@ -61,6 +72,11 @@ spec:
mountPath: /mnt/frameworkbarrier
- name: data-volume
mountPath: /mnt/data
+ # [PREREQUISITE]
+ # User needs to create a service account for frameworkbarrier, if the
+ # k8s cluster enforces authorization.
+ # See more in ./example/framework/extension/frameworkbarrier.yaml
+ serviceAccountName: frameworkbarrier
initContainers:
- name: frameworkbarrier
# Using official image to demonstrate this example.
@@ -79,10 +95,12 @@ spec:
- name: frameworkbarrier-volume
emptyDir: {}
- name: data-volume
+ # [PREREQUISITE]
# User needs to specify his own data-volume for input data and
- # output model and the data-volume must be a distributed shared
- # file system, so that data can be "handed off" between Pods,
- # such as nfs, cephfs or glusterfs, etc.
+ # output model.
+ # The data-volume must be a distributed shared file system, so that
+ # data can be "handed off" between Pods, such as nfs, cephfs or
+ # glusterfs, etc.
# See https://kubernetes.io/docs/concepts/storage/volumes.
#
# And then he needs to download and extract the example input data
@@ -108,7 +126,9 @@ spec:
pod:
spec:
restartPolicy: Never
- hostNetwork: true
+ # [PREREQUISITE]
+ # Same as ps TaskRole.
+ hostNetwork: false
containers:
- name: tensorflow
image: frameworkcontroller/tensorflow-examples:gpu
@@ -127,12 +147,17 @@ spec:
- containerPort: 5001
resources:
limits:
+ # [PREREQUISITE]
+ # Same as ps TaskRole.
nvidia.com/gpu: 1
volumeMounts:
- name: frameworkbarrier-volume
mountPath: /mnt/frameworkbarrier
- name: data-volume
mountPath: /mnt/data
+ # [PREREQUISITE]
+ # Same as ps TaskRole.
+ serviceAccountName: frameworkbarrier
initContainers:
- name: frameworkbarrier
image: frameworkcontroller/frameworkbarrier
@@ -148,6 +173,8 @@ spec:
- name: frameworkbarrier-volume
emptyDir: {}
- name: data-volume
+ # [PREREQUISITE]
+ # Same as ps TaskRole.
#nfs:
# server: {NFS Server Host}
# path: {NFS Shared Directory}
diff --git a/example/run/README.md b/example/run/README.md
new file mode 100644
index 00000000..9b31dfe2
--- /dev/null
+++ b/example/run/README.md
@@ -0,0 +1,136 @@
+# Run FrameworkController
+We provide various approaches to run FrameworkController:
+ - [Run By Kubernetes StatefulSet](#RunByKubernetesStatefulSet)
+ - [Run By Docker Container](#RunByDockerContainer)
+ - [Run By OS Process](#RunByOSProcess)
+
+Notes:
+ - For a single k8s cluster, one instance of FrameworkController orchestrates all Frameworks in all namespaces.
+ - For a single k8s cluster, ensure at most one instance of FrameworkController is running at any point in time.
+ - For the full FrameworkController configuration, see
+ [Config Usage](../../pkg/apis/frameworkcontroller/v1/config.go) and [Config Example](../../example/config/default/frameworkcontroller.yaml).
+
+## Run By Kubernetes StatefulSet
+- This approach is better for production, since StatefulSet by itself provides [self-healing](https://kubernetes.io/docs/concepts/workloads/pods/pod/#durability-of-pods-or-lack-thereof) and can ensure [at most one instance](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/pod-safety.md) of FrameworkController is running at any point in time.
+- Using official image to demonstrate this example.
+
+**Prerequisite**
+
+If the k8s cluster enforces [Authorization](https://kubernetes.io/docs/reference/access-authn-authz/authorization/#authorization-modules), you need to first create a [Service Account](https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account) with granted permission for FrameworkController. For example, if the cluster enforces [RBAC](https://kubernetes.io/docs/reference/access-authn-authz/rbac/#kubectl-create-clusterrolebinding):
+```shell
+kubectl create serviceaccount frameworkcontroller --namespace default
+kubectl create clusterrolebinding frameworkcontroller \
+ --clusterrole=cluster-admin \
+ --user=system:serviceaccount:default:frameworkcontroller
+```
+
+**Run**
+
+Run FrameworkController with above Service Account and the [k8s inClusterConfig](https://kubernetes.io/docs/tasks/access-application-cluster/access-cluster/#accessing-the-api-from-a-pod):
+```shell
+kubectl create -f frameworkcontroller.yaml
+```
+
+frameworkcontroller.yaml:
+```yaml
+apiVersion: apps/v1
+kind: StatefulSet
+metadata:
+ name: frameworkcontroller
+ namespace: default
+spec:
+ serviceName: frameworkcontroller
+ selector:
+ matchLabels:
+ app: frameworkcontroller
+ replicas: 1
+ template:
+ metadata:
+ labels:
+ app: frameworkcontroller
+ spec:
+ # Using the service account with granted permission
+ # if the k8s cluster enforces authorization.
+ serviceAccountName: frameworkcontroller
+ containers:
+ - name: frameworkcontroller
+ image: frameworkcontroller/frameworkcontroller
+ # Using k8s inClusterConfig, so usually, no need to specify
+ # KUBE_APISERVER_ADDRESS or KUBECONFIG
+ #env:
+ #- name: KUBE_APISERVER_ADDRESS
+ # value: {http[s]://host:port}
+ #- name: KUBECONFIG
+ # value: {Pod Local KubeConfig File Path}
+```
+
+## Run By Docker Container
+- This approach may be better for development sometimes.
+- Using official image to demonstrate this example.
+
+**Run**
+
+If you have an insecure ApiServer address (can be got from [Insecure ApiServer](https://kubernetes.io/docs/reference/access-authn-authz/controlling-access/#api-server-ports-and-ips) or [kubectl proxy](https://kubernetes.io/docs/tasks/access-application-cluster/access-cluster/#using-kubectl-proxy)) which does not enforce authentication, you only need to provide the address:
+```shell
+docker run -e KUBE_APISERVER_ADDRESS={http[s]://host:port} \
+ frameworkcontroller/frameworkcontroller
+```
+
+Otherwise, you need to provide your [KubeConfig File](https://kubernetes.io/docs/tasks/access-application-cluster/configure-access-multiple-clusters/#explore-the-home-kube-directory) which inlines or refers the [ApiServer Credential Files](https://kubernetes.io/docs/reference/access-authn-authz/controlling-access/#transport-security) with [granted permission](https://kubernetes.io/docs/tasks/access-application-cluster/configure-access-multiple-clusters/#define-clusters-users-and-contexts):
+```shell
+docker run -e KUBECONFIG=/mnt/.kube/config \
+ -v {Host Local KubeConfig File Path}:/mnt/.kube/config \
+ -v {Host Local ApiServer Credential File Path}:{Container Local ApiServer Credential File Path} \
+ frameworkcontroller/frameworkcontroller
+```
+For example, if the k8s cluster is created by [Minikube](https://kubernetes.io/docs/setup/minikube):
+```shell
+docker run -e KUBECONFIG=/mnt/.kube/config \
+ -v ${HOME}/.kube/config:/mnt/.kube/config \
+ -v ${HOME}/.minikube:${HOME}/.minikube \
+ frameworkcontroller/frameworkcontroller
+```
+
+## Run By OS Process
+- This approach may be better for development sometimes.
+- Using local built binary distribution to demonstrate this example.
+
+**Prerequisite**
+
+Ensure you have installed [Golang 10.10 or above](https://golang.org/doc/install#install) and the [${GOPATH}](https://golang.org/doc/code.html#GOPATH) is valid.
+
+Then build the FrameworkController binary distribution:
+```shell
+export PROJECT_DIR=${GOPATH}/src/github.com/microsoft/frameworkcontroller
+rm -rf ${PROJECT_DIR}
+mkdir -p ${PROJECT_DIR}
+git clone https://github.com/Microsoft/frameworkcontroller.git ${PROJECT_DIR}
+cd ${PROJECT_DIR}
+./build/frameworkcontroller/go-build.sh
+```
+
+**Run**
+
+If you have an insecure ApiServer address (can be got from [Insecure ApiServer](https://kubernetes.io/docs/reference/access-authn-authz/controlling-access/#api-server-ports-and-ips) or [kubectl proxy](https://kubernetes.io/docs/tasks/access-application-cluster/access-cluster/#using-kubectl-proxy)) which does not enforce authentication, you only need to provide the address:
+```shell
+KUBE_APISERVER_ADDRESS={http[s]://host:port} \
+ ./dist/frameworkcontroller/start.sh
+```
+
+Otherwise, you need to provide your [KubeConfig File](https://kubernetes.io/docs/tasks/access-application-cluster/configure-access-multiple-clusters/#explore-the-home-kube-directory) which inlines or refers the [ApiServer Credential Files](https://kubernetes.io/docs/reference/access-authn-authz/controlling-access/#transport-security) with [granted permission](https://kubernetes.io/docs/tasks/access-application-cluster/configure-access-multiple-clusters/#define-clusters-users-and-contexts):
+```shell
+KUBECONFIG={Process Local KubeConfig File Path} \
+ ./dist/frameworkcontroller/start.sh
+```
+For example:
+```shell
+KUBECONFIG=${HOME}/.kube/config \
+ ./dist/frameworkcontroller/start.sh
+```
+And in above example, `${HOME}/.kube/config` is the default value of `KUBECONFIG`, so you can skip it:
+```shell
+./dist/frameworkcontroller/start.sh
+```
+
+## Next
+1. [Submit Framework](../framework)
diff --git a/example/run/frameworkcontroller.md b/example/run/frameworkcontroller.md
deleted file mode 100644
index fd52e414..00000000
--- a/example/run/frameworkcontroller.md
+++ /dev/null
@@ -1,63 +0,0 @@
-# Run FrameworkController
-
-1. Ensure at most one instance of FrameworkController is run for a single k8s cluster.
-2. For the full FrameworkController configuration, see
- [Config Usage](../../pkg/apis/frameworkcontroller/v1/config.go) and [Config Example](../../example/config).
-
-## Run by a OS Process
-
-```shell
-KUBE_APISERVER_ADDRESS={http[s]://host:port} ./dist/frameworkcontroller/start.sh
-```
-Or
-```shell
-KUBECONFIG={Process Local KubeConfig File Path} ./dist/frameworkcontroller/start.sh
-```
-
-## Run by a Docker Container
-
-```shell
-docker run -e KUBE_APISERVER_ADDRESS={http[s]://host:port} frameworkcontroller
-```
-Or
-```shell
-docker run -e KUBECONFIG={Container Local KubeConfig File Path} frameworkcontroller
-```
-
-## Run by a Kubernetes StatefulSet
-
-```shell
-kubectl create -f frameworkcontroller.yaml
-```
-
-frameworkcontroller.yaml:
-```yaml
-apiVersion: apps/v1
-kind: StatefulSet
-metadata:
- name: frameworkcontroller
-spec:
- serviceName: frameworkcontroller
- selector:
- matchLabels:
- app: frameworkcontroller
- replicas: 1
- template:
- metadata:
- labels:
- app: frameworkcontroller
- spec:
- containers:
- - name: frameworkcontroller
- # Using official image to demonstrate this example.
- image: frameworkcontroller/frameworkcontroller
- env:
- # May not need to specify KUBE_APISERVER_ADDRESS or KUBECONFIG
- # if the target cluster to control is the cluster running the
- # StatefulSet.
- # See k8s inClusterConfig.
- - name: KUBE_APISERVER_ADDRESS
- value: {http[s]://host:port}
- - name: KUBECONFIG
- value: {Pod Local KubeConfig File Path}
-```
diff --git a/pkg/apis/frameworkcontroller/v1/config.go b/pkg/apis/frameworkcontroller/v1/config.go
index e0c83008..6415ebcf 100644
--- a/pkg/apis/frameworkcontroller/v1/config.go
+++ b/pkg/apis/frameworkcontroller/v1/config.go
@@ -32,11 +32,27 @@ import (
)
type Config struct {
- // If both kubeApiServerAddress and kubeConfigFilePath after defaulting are still
- // empty, falls back to k8s inClusterConfig.
+ // KubeApiServerAddress is default to ${KUBE_APISERVER_ADDRESS}.
+ // KubeConfigFilePath is default to ${KUBECONFIG} then falls back to ${HOME}/.kube/config.
+ //
+ // If both KubeApiServerAddress and KubeConfigFilePath after defaulting are still empty, falls back to the
+ // [k8s inClusterConfig](https://kubernetes.io/docs/tasks/access-application-cluster/access-cluster/#accessing-the-api-from-a-pod).
+ //
+ // If both KubeApiServerAddress and KubeConfigFilePath after defaulting are not empty,
+ // KubeApiServerAddress overrides the server address specified in the file referred by KubeConfigFilePath.
+ //
+ // If only KubeApiServerAddress after defaulting is not empty, it should be an insecure ApiServer address (can be got from
+ // [Insecure ApiServer](https://kubernetes.io/docs/reference/access-authn-authz/controlling-access/#api-server-ports-and-ips) or
+ // [kubectl proxy](https://kubernetes.io/docs/tasks/access-application-cluster/access-cluster/#using-kubectl-proxy))
+ // which does not enforce authentication.
+ //
+ // If only KubeConfigFilePath after defaulting is not empty, it should be an valid
+ // [KubeConfig File](https://kubernetes.io/docs/tasks/access-application-cluster/configure-access-multiple-clusters/#explore-the-home-kube-directory)
+ // which inlines or refers the valid
+ // [ApiServer Credential Files](https://kubernetes.io/docs/reference/access-authn-authz/controlling-access/#transport-security).
+ //
// Address should be in format http[s]://host:port
KubeApiServerAddress *string `yaml:"kubeApiServerAddress"`
- // See https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#config
KubeConfigFilePath *string `yaml:"kubeConfigFilePath"`
// Number of concurrent workers to process each different Frameworks
diff --git a/pkg/barrier/barrier.go b/pkg/barrier/barrier.go
index b81e214d..b9591c55 100644
--- a/pkg/barrier/barrier.go
+++ b/pkg/barrier/barrier.go
@@ -101,11 +101,11 @@ const (
// Config
///////////////////////////////////////////////////////////////////////////////////////
type Config struct {
- // The Framework for which the barrier waits.
- // Address should be in format http[s]://host:port
+ // See the same fields in pkg/apis/frameworkcontroller/v1/config.go
KubeApiServerAddress string `yaml:"kubeApiServerAddress"`
- // See https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#config
KubeConfigFilePath string `yaml:"kubeConfigFilePath"`
+
+ // The Framework for which the barrier waits.
FrameworkNamespace string `yaml:"frameworkNamespace"`
FrameworkName string `yaml:"frameworkName"`