diff --git a/README.md b/README.md
index 48568d19..c8094de5 100644
--- a/README.md
+++ b/README.md
@@ -41,13 +41,12 @@ A Framework represents an application with a set of Tasks:
4. With consistent identity {FrameworkName}-{TaskRoleName}-{TaskIndex} as PodName
5. With fine grained [RetryPolicy](doc/user-manual.md#RetryPolicy) for each Task and the whole Framework
6. With fine grained [FrameworkAttemptCompletionPolicy](doc/user-manual.md#FrameworkAttemptCompletionPolicy) for each TaskRole
-7. Guarantees at most one instance of a specific Task is running at any point in time
-8. Guarantees at most one instance of a specific Framework is running at any point in time
+7. With PodGracefulDeletionTimeoutSec for each Task to [tune Consistency vs Availability](doc/user-manual.md#FrameworkConsistencyAvailability)
### Controller Feature
1. Highly generalized as it is built for all kinds of applications
2. Light-weight as it is only responsible for Pod orchestration
-3. Well-defined Framework consistency, state machine and failure model
+3. Well-defined Framework [Consistency vs Availability](doc/user-manual.md#FrameworkConsistencyAvailability), [State Machine](doc/user-manual.md#FrameworkTaskStateMachine) and [Failure Model](doc/user-manual.md#CompletionStatus)
4. Tolerate Pod/ConfigMap unexpected deletion, Node/Network/FrameworkController/Kubernetes failure
5. Support to specify how to [classify and summarize Pod failures](doc/user-manual.md#PodFailureClassification)
6. Support to expose [Framework and Pod history snapshots](doc/user-manual.md#FrameworkPodHistory) to external systems
diff --git a/doc/user-manual.md b/doc/user-manual.md
index 005aaefd..fa7f348e 100644
--- a/doc/user-manual.md
+++ b/doc/user-manual.md
@@ -9,6 +9,8 @@
- [RetryPolicy](#RetryPolicy)
- [FrameworkAttemptCompletionPolicy](#FrameworkAttemptCompletionPolicy)
- [Framework and Pod History](#FrameworkPodHistory)
+ - [Framework and Task State Machine](#FrameworkTaskStateMachine)
+ - [Framework Consistency vs Availability](#FrameworkConsistencyAvailability)
- [Controller Extension](#ControllerExtension)
- [FrameworkBarrier](#FrameworkBarrier)
- [HivedScheduler](#HivedScheduler)
@@ -116,7 +118,8 @@ Type: application/json or application/yaml
Delete the specified Framework.
Notes:
-* If you need to ensure at most one instance of a specific Framework (identified by the FrameworkName) is running at any point in time, you should always use and only use the [Foreground Deletion](https://kubernetes.io/docs/concepts/workloads/controllers/garbage-collection/#foreground-cascading-deletion) in the provided body, see [Framework Notes](../pkg/apis/frameworkcontroller/v1/types.go). However, `kubectl delete` does not support to specify the Foreground Deletion at least for [Kubernetes v1.14.2](https://github.com/kubernetes/kubernetes/issues/66110#issuecomment-413761559), so you may have to use other [Supported Client](#SupportedClient).
+* If you need to achieve all the [Framework ConsistencyGuarantees](#ConsistencyGuarantees) or achieve higher [Framework Availability](#FrameworkAvailability) by leveraging the [PodGracefulDeletionTimeoutSec](../pkg/apis/frameworkcontroller/v1/types.go), you should always use and only use the [Foreground Deletion](https://kubernetes.io/docs/concepts/workloads/controllers/garbage-collection/#foreground-cascading-deletion) in the provided body.
+* However, `kubectl delete` does not support to specify the Foreground Deletion at least for [Kubernetes v1.14.2](https://github.com/kubernetes/kubernetes/issues/66110#issuecomment-413761559), so you may have to use other [Supported Client](#SupportedClient).
**Response**
@@ -370,6 +373,87 @@ Notes:
## Framework and Pod History
By leveraging [LogObjectSnapshot](../pkg/apis/frameworkcontroller/v1/config.go), external systems, such as [Fluentd](https://www.fluentd.org) and [ElasticSearch](https://www.elastic.co/products/elasticsearch), can collect and process Framework and Pod history snapshots even if it was retried or deleted, such as persistence, metrics conversion, visualization, alerting, acting, analysis, etc.
+## Framework and Task State Machine
+### Framework State Machine
+[FrameworkState](../pkg/apis/frameworkcontroller/v1/types.go)
+
+### Task State Machine
+[TaskState](../pkg/apis/frameworkcontroller/v1/types.go)
+
+## Framework Consistency vs Availability
+### Framework Consistency
+#### ConsistencyGuarantees
+For a specific Task identified by {FrameworkName}-{TaskRoleName}-{TaskIndex}:
+
+- **ConsistencyGuarantee1**:
+
+ At most one instance of the Task is running at any point in time.
+
+- **ConsistencyGuarantee2**:
+
+ No instance of the Task is running if it is TaskAttemptCompleted, TaskCompleted or the whole Framework is deleted.
+
+For a specific Framework identified by {FrameworkName}:
+
+- **ConsistencyGuarantee3**:
+
+ At most one instance of the Framework is running at any point in time.
+
+- **ConsistencyGuarantee4**:
+
+ No instance of the Framework is running if it is FrameworkAttemptCompleted, FrameworkCompleted or the whole Framework is deleted.
+
+#### How to achieve ConsistencyGuarantees
+
+The default behavior is to achieve all the [ConsistencyGuarantees](#ConsistencyGuarantees), if you do not explicitly violate below guidelines:
+
+1. Achieve **ConsistencyGuarantee1**:
+
+ Do not [force delete the managed Pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/#force-deletion-of-pods):
+
+ 1. Do not set [PodGracefulDeletionTimeoutSec](../pkg/apis/frameworkcontroller/v1/types.go) to be not nil.
+
+ For example, the default PodGracefulDeletionTimeoutSec is acceptable.
+
+ 2. Do not delete the managed Pod with [0 GracePeriodSeconds](https://kubernetes.io/docs/concepts/workloads/pods/pod/#force-deletion-of-pods).
+
+ For example, the default Pod deletion is acceptable.
+
+ 3. Do not delete the Node which runs the managed Pod.
+
+ For example, [drain the Node](https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node) before delete it is acceptable.
+
+ *The Task instance can be universally located by its [TaskAttemptInstanceUID](../pkg/apis/frameworkcontroller/v1/types.go) or [PodUID](../pkg/apis/frameworkcontroller/v1/types.go).*
+
+ *To avoid the Pod is stuck in deleting forever, such as if its Node is down forever, leverage the same approach as [Delete StatefulSet Pod only after the Pod termination has been confirmed](https://kubernetes.io/docs/tasks/run-application/force-delete-stateful-set-pod/#delete-pods) manually or by your [Cloud Controller Manager](https://kubernetes.io/docs/tasks/administer-cluster/running-cloud-controller/#running-cloud-controller-manager).*
+
+2. Achieve **ConsistencyGuarantee2**, **ConsistencyGuarantee3** and **ConsistencyGuarantee4**:
+ 1. Achieve **ConsistencyGuarantee1**.
+
+ 2. Must delete the managed ConfigMap with [Foreground PropagationPolicy](https://kubernetes.io/docs/concepts/workloads/controllers/garbage-collection/#foreground-cascading-deletion).
+
+ For example, the default ConfigMap deletion is acceptable.
+
+ 3. Must delete the Framework with [Foreground PropagationPolicy](https://kubernetes.io/docs/concepts/workloads/controllers/garbage-collection/#foreground-cascading-deletion).
+
+ For example, the default Framework deletion may not be acceptable, since the default PropagationPolicy for Framework object may be Background.
+
+ 4. Do not change the [OwnerReferences](https://kubernetes.io/docs/concepts/workloads/controllers/garbage-collection/#owners-and-dependents) of the managed ConfigMap and Pods.
+
+ *The Framework instance can be universally located by its [FrameworkAttemptInstanceUID](../pkg/apis/frameworkcontroller/v1/types.go) or [ConfigMapUID](../pkg/apis/frameworkcontroller/v1/types.go).*
+
+### Framework Availability
+According to the [CAP theorem](https://en.wikipedia.org/wiki/CAP_theorem), in the presence of a network partition, you cannot achieve both consistency and availability at the same time in any distributed system. So you have to make a trade-off between the [Framework Consistency](#FrameworkConsistency) and the [Framework Availability](#FrameworkAvailability).
+
+You can tune the trade-off, such as to achieve higher [Framework Availability](#FrameworkAvailability) by sacrificing the [Framework Consistency](#FrameworkConsistency):
+1. Set a small [Pod TolerationSeconds for TaintBasedEvictions](https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/#taint-based-evictions)
+2. Set a small [PodGracefulDeletionTimeoutSec](../pkg/apis/frameworkcontroller/v1/types.go)
+3. Violate other guidelines mentioned in [How to achieve ConsistencyGuarantees](#ConsistencyGuaranteesHowTo), such as manually force delete a problematic Pod.
+
+See more in:
+1. [PodGracefulDeletionTimeoutSec](../pkg/apis/frameworkcontroller/v1/types.go)
+2. [Pod Safety and Consistency Guarantees](https://github.com/kubernetes/community/blob/ee8998b156031f6b363daade51ca2d12521f4ac0/contributors/design-proposals/storage/pod-safety.md)
+
## Controller Extension
### FrameworkBarrier
1. [Usage](../pkg/barrier/barrier.go)
diff --git a/pkg/apis/frameworkcontroller/v1/types.go b/pkg/apis/frameworkcontroller/v1/types.go
index 8c8a3e40..bb4cdd4e 100644
--- a/pkg/apis/frameworkcontroller/v1/types.go
+++ b/pkg/apis/frameworkcontroller/v1/types.go
@@ -47,8 +47,7 @@ type FrameworkList struct {
// 4. With consistent identity {FrameworkName}-{TaskRoleName}-{TaskIndex} as PodName
// 5. With fine grained RetryPolicy for each Task and the whole Framework
// 6. With fine grained FrameworkAttemptCompletionPolicy for each TaskRole
-// 7. Guarantees at most one instance of a specific Task is running at any point in time
-// 8. Guarantees at most one instance of a specific Framework is running at any point in time
+// 7. With PodGracefulDeletionTimeoutSec for each Task to tune Consistency vs Availability
//
// Notes:
// 1. Status field should only be modified by FrameworkController, and
@@ -57,26 +56,6 @@ type FrameworkList struct {
// Leverage CRD status subresource to isolate Status field modification with other fields.
// This can help to avoid unintended modification, such as users may unintendedly modify
// the status when updating the spec.
-// 2. To ensure at most one instance of a specific Task is running at any point in time:
-// 1. Do not delete the managed Pod with 0 gracePeriodSeconds.
-// For example, the default Pod deletion is acceptable.
-// 2. Do not delete the Node which runs the managed Pod.
-// For example, drain before delete the Node is acceptable.
-// The instance can be universally located by its TaskAttemptInstanceUID or PodUID.
-// See RetryPolicySpec and TaskAttemptStatus.
-// 3. To ensure at most one instance of a specific Framework is running at any point in time:
-// 1. Ensure ensure at most one instance of a specific Task is running at any point in time.
-// 2. Do not delete the managed ConfigMap with Background propagationPolicy.
-// For example, the default ConfigMap deletion is acceptable.
-// 3. Must delete the Framework with Foreground propagationPolicy.
-// For example, the default Framework deletion may not be acceptable, since the default
-// propagationPolicy for Framework object may be Background.
-// The instance can be universally located by its FrameworkAttemptInstanceUID or ConfigMapUID.
-// See RetryPolicySpec and FrameworkAttemptStatus.
-// 4. To ensure there is no orphan object previously managed by FrameworkController:
-// 1. Do not delete the Framework or the managed ConfigMap with Orphan propagationPolicy.
-// For example, the default Framework and ConfigMap deletion is acceptable.
-// 2. Do not change the OwnerReferences of the managed ConfigMap and Pods.
//////////////////////////////////////////////////////////////////////////////////////////////////
type Framework struct {
meta.TypeMeta `json:",inline"`
@@ -107,8 +86,31 @@ type TaskRoleSpec struct {
}
type TaskSpec struct {
- RetryPolicy RetryPolicySpec `json:"retryPolicy"`
- Pod core.PodTemplateSpec `json:"pod"`
+ RetryPolicy RetryPolicySpec `json:"retryPolicy"`
+
+ // If the Task's current associated Pod object is being deleted, i.e. graceful
+ // deletion, but the graceful deletion cannot finish within this timeout, then
+ // the Pod will be deleted forcefully by FrameworkController.
+ // Default to nil.
+ //
+ // If this timeout is not nil, the Pod may be deleted forcefully by FrameworkController.
+ // The force deletion does not wait for confirmation that the Pod has been terminated
+ // totally, and then the Task will be immediately transitioned to TaskAttemptCompleted.
+ // As a consequence, the Task will be immediately completed or retried with another
+ // new Pod, however the old Pod may be still running.
+ // So, in this setting, the Task behaves like ReplicaSet, and choose it if the Task
+ // favors availability over consistency, such as stateless Task.
+ // However, to still best effort execute graceful deletion with the toleration for
+ // transient deletion failures, this timeout should be at least longer than the Pod
+ // TerminationGracePeriodSeconds + minimal TolerationSeconds for TaintBasedEvictions.
+ //
+ // If this timeout is nil, the Pod will always be deleted gracefully, i.e. never
+ // be deleted forcefully by FrameworkController. This helps to guarantee at most
+ // one instance of a specific Task is running at any point in time.
+ // So, in this setting, the Task behaves like StatefulSet, and choose it if the Task
+ // favors consistency over availability, such as stateful Task.
+ PodGracefulDeletionTimeoutSec *int64 `json:"podGracefulDeletionTimeoutSec"`
+ Pod core.PodTemplateSpec `json:"pod"`
}
type ExecutionType string
@@ -163,10 +165,9 @@ const (
// So, an attempt identified by its attempt id may be associated with multiple
// attempt instances over time, i.e. multiple instances may be run for the
// attempt over time, however, at most one instance is exposed into ApiServer
-// over time and at most one instance is running at any point in time.
-// So, the actual retried attempt instances maybe exceed the RetryPolicySpec
-// in rare cases, however, the RetryPolicyStatus will never exceed the
-// RetryPolicySpec.
+// over time.
+// So, the actual retried attempt instances may exceed the RetryPolicySpec in
+// rare cases, however, the RetryPolicyStatus will never exceed the RetryPolicySpec.
// 2. Resort to other spec to control other kind of RetryPolicy:
// 1. Container RetryPolicy is the RestartPolicy in Pod Spec.
// See https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#restart-policy
diff --git a/pkg/apis/frameworkcontroller/v1/zz_generated.deepcopy.go b/pkg/apis/frameworkcontroller/v1/zz_generated.deepcopy.go
index 7cdc6bbd..6a743885 100644
--- a/pkg/apis/frameworkcontroller/v1/zz_generated.deepcopy.go
+++ b/pkg/apis/frameworkcontroller/v1/zz_generated.deepcopy.go
@@ -868,6 +868,11 @@ func (in *TaskRoleStatus) DeepCopy() *TaskRoleStatus {
func (in *TaskSpec) DeepCopyInto(out *TaskSpec) {
*out = *in
out.RetryPolicy = in.RetryPolicy
+ if in.PodGracefulDeletionTimeoutSec != nil {
+ in, out := &in.PodGracefulDeletionTimeoutSec, &out.PodGracefulDeletionTimeoutSec
+ *out = new(int64)
+ **out = **in
+ }
in.Pod.DeepCopyInto(&out.Pod)
return
}
diff --git a/pkg/barrier/barrier.go b/pkg/barrier/barrier.go
index 438696e8..01aa5e02 100644
--- a/pkg/barrier/barrier.go
+++ b/pkg/barrier/barrier.go
@@ -271,6 +271,7 @@ func (b *FrameworkBarrier) Run() {
if isPermanentErr {
exit(ci.CompletionCodeContainerPermanentFailed)
} else {
+ // May also timeout, but still treat as Unknown Error
exit(ci.CompletionCode(1))
}
}
diff --git a/pkg/common/utils.go b/pkg/common/utils.go
index cbe266ae..750b0ac4 100644
--- a/pkg/common/utils.go
+++ b/pkg/common/utils.go
@@ -87,6 +87,14 @@ func PtrUIDStr(s string) *types.UID {
return PtrUID(types.UID(s))
}
+func PtrDeletionPropagation(o meta.DeletionPropagation) *meta.DeletionPropagation {
+ return &o
+}
+
+func PtrTime(o meta.Time) *meta.Time {
+ return &o
+}
+
func PtrNow() *meta.Time {
now := meta.Now()
return &now
diff --git a/pkg/controller/controller.go b/pkg/controller/controller.go
index c808ae31..11e6351e 100644
--- a/pkg/controller/controller.go
+++ b/pkg/controller/controller.go
@@ -514,75 +514,66 @@ func (c *FrameworkController) syncFramework(key string) (returnedErr error) {
"Failed: Framework cannot be got from local cache: %v", err)
}
} else {
- if localF.DeletionTimestamp != nil {
- // Skip syncFramework to avoid fighting with GarbageCollectionController,
- // because GarbageCollectionController may be deleting the dependent object.
- klog.Infof(logPfx+
- "Skipped: Framework is deleting: Will be deleted at %v",
- localF.DeletionTimestamp)
- return nil
+ f := localF.DeepCopy()
+ // From now on, f is a writable copy of the original local cached one, and
+ // it may be different from the original one.
+
+ expected := c.getExpectedFrameworkStatusInfo(f.Key())
+ if expected == nil {
+ if f.Status != nil {
+ // Recover f related things, since it is the first time we see it and
+ // its Status is not nil.
+ // No need to recover previous enqueued items, because the Informer has
+ // already delivered the Add events for all recovered Frameworks which
+ // caused all Frameworks will be enqueued to sync.
+ // No need to recover previous scheduled to enqueue items, because the
+ // schedule will be recovered during sync.
+ }
+
+ // f.Status must be the same as the remote one, since it is the first
+ // time we see it.
+ c.updateExpectedFrameworkStatusInfo(f.Key(), f.Status, true)
} else {
- f := localF.DeepCopy()
- // From now on, f is a writable copy of the original local cached one, and
- // it may be different from the original one.
-
- expected := c.getExpectedFrameworkStatusInfo(f.Key())
- if expected == nil {
- if f.Status != nil {
- // Recover f related things, since it is the first time we see it and
- // its Status is not nil.
- // No need to recover previous enqueued items, because the Informer has
- // already delivered the Add events for all recovered Frameworks which
- // caused all Frameworks will be enqueued to sync.
- // No need to recover previous scheduled to enqueue items, because the
- // schedule will be recovered during sync.
- }
+ // f.Status may be outdated, so override it with the expected one, to
+ // ensure the Framework.Status is Monotonically Exposed.
+ f.Status = expected.status
- // f.Status must be the same as the remote one, since it is the first
- // time we see it.
- c.updateExpectedFrameworkStatusInfo(f.Key(), f.Status, true)
- } else {
- // f.Status may be outdated, so override it with the expected one, to
- // ensure the Framework.Status is Monotonically Exposed.
- f.Status = expected.status
-
- // Ensure the expected Framework.Status is the same as the remote one
- // before sync.
- if !expected.remoteSynced {
- updateErr := c.updateRemoteFrameworkStatus(f)
- if updateErr != nil {
- return updateErr
- }
- c.updateExpectedFrameworkStatusInfo(f.Key(), f.Status, true)
+ // Ensure the expected Framework.Status is the same as the remote one
+ // before sync.
+ if !expected.remoteSynced {
+ updateErr := c.updateRemoteFrameworkStatus(f)
+ if updateErr != nil {
+ return updateErr
}
+ c.updateExpectedFrameworkStatusInfo(f.Key(), f.Status, true)
}
+ }
- // At this point, f.Status is the same as the expected and remote
- // Framework.Status, so it is ready to sync against f.Spec and other
- // related objects.
- errs := []error{}
- remoteF := f.DeepCopy()
+ // At this point, f.Status is the same as the expected and remote
+ // Framework.Status, so it is ready to sync against f.Spec and other
+ // related objects.
+ errs := []error{}
+ remoteF := f.DeepCopy()
- syncErr := c.syncFrameworkStatus(f)
- errs = append(errs, syncErr)
+ syncErr := c.syncFrameworkStatus(f)
+ errs = append(errs, syncErr)
- if !reflect.DeepEqual(remoteF.Status, f.Status) {
- // Always update the expected and remote Framework.Status even if sync
- // error, since f.Status should never be corrupted due to any Platform
- // Transient Error, so no need to rollback to the one before sync, and
- // no need to DeepCopy between f.Status and the expected one.
- updateErr := c.updateRemoteFrameworkStatus(f)
- errs = append(errs, updateErr)
+ if !reflect.DeepEqual(remoteF.Status, f.Status) {
+ // Always update the expected and remote Framework.Status even if sync
+ // error, since f.Status should never be corrupted due to any Platform
+ // Transient Error, so no need to rollback to the one before sync, and
+ // no need to DeepCopy between f.Status and the expected one.
+ updateErr := c.updateRemoteFrameworkStatus(f)
+ errs = append(errs, updateErr)
- c.updateExpectedFrameworkStatusInfo(f.Key(), f.Status, updateErr == nil)
- } else {
- klog.Infof(logPfx +
- "Skip to update the expected and remote Framework.Status since " +
- "they are unchanged")
- }
-
- return errorAgg.NewAggregate(errs)
+ c.updateExpectedFrameworkStatusInfo(f.Key(), f.Status, updateErr == nil)
+ } else {
+ klog.Infof(logPfx +
+ "Skip to update the expected and remote Framework.Status since " +
+ "they are unchanged")
}
+
+ return errorAgg.NewAggregate(errs)
}
}
@@ -645,6 +636,19 @@ func (c *FrameworkController) enqueueTaskRetryDelayTimeoutCheck(
failIfTimeout, "TaskRetryDelayTimeoutCheck")
}
+func (c *FrameworkController) enqueuePodGracefulDeletionTimeoutCheck(
+ f *ci.Framework, taskRoleName string,
+ failIfTimeout bool, pod *core.Pod) bool {
+ taskSpec := f.TaskRoleSpec(taskRoleName).Task
+ if pod.DeletionTimestamp == nil {
+ return false
+ }
+
+ return c.enqueueFrameworkTimeoutCheck(
+ f, *internal.GetPodDeletionStartTime(pod), taskSpec.PodGracefulDeletionTimeoutSec,
+ failIfTimeout, "PodGracefulDeletionTimeoutCheck")
+}
+
func (c *FrameworkController) enqueueFrameworkTimeoutCheck(
f *ci.Framework, startTime meta.Time, timeoutSec *int64,
failIfTimeout bool, logSfx string) bool {
@@ -693,7 +697,8 @@ func (c *FrameworkController) syncFrameworkState(f *ci.Framework) (err error) {
if f.Status.State == ci.FrameworkCompleted {
if c.enqueueFrameworkCompletedRetainTimeoutCheck(f, true) {
- klog.Infof(logPfx + "Skipped: Framework is already completed")
+ klog.Infof(logPfx + "Skipped: Framework is already completed, " +
+ "and waiting to be deleted after FrameworkCompletedRetainSec")
return nil
}
@@ -737,7 +742,7 @@ func (c *FrameworkController) syncFrameworkState(f *ci.Framework) (err error) {
diag = fmt.Sprintf(
"ConfigMap does not appear in the local cache within timeout %v, "+
- "so consider it was deleted and force delete it",
+ "so consider it was deleted and explicitly delete it",
common.SecToDuration(c.cConfig.ObjectLocalCacheCreationTimeoutSec))
code = ci.CompletionCodeConfigMapCreationTimeout
klog.Warningf(logPfx + diag)
@@ -887,6 +892,12 @@ func (c *FrameworkController) syncFrameworkState(f *ci.Framework) (err error) {
// FrameworkAttemptDeleting}
if f.Status.State == ci.FrameworkAttemptCreationPending {
+ if f.DeletionTimestamp != nil {
+ klog.Infof(logPfx + "Skip to createFrameworkAttempt: " +
+ "Framework is deleting")
+ return nil
+ }
+
if f.Spec.ExecutionType == ci.ExecutionStop {
diag := fmt.Sprintf("User has requested to stop the Framework")
klog.Infof(logPfx + diag)
@@ -965,22 +976,22 @@ func (c *FrameworkController) syncFrameworkState(f *ci.Framework) (err error) {
}
func (c *FrameworkController) deleteFramework(
- f *ci.Framework, force bool) error {
+ f *ci.Framework, confirm bool) error {
errPfx := fmt.Sprintf(
- "[%v]: Failed to delete Framework %v: force: %v: ",
- f.Key(), f.UID, force)
+ "[%v]: Failed to delete Framework %v: confirm: %v: ",
+ f.Key(), f.UID, confirm)
- // Do not set zero GracePeriodSeconds to do force deletion in any case, since
- // it will also immediately delete Pod in PodUnknown state, while the Pod may
- // be still running.
deleteErr := c.fClient.FrameworkcontrollerV1().Frameworks(f.Namespace).Delete(
- f.Name, &meta.DeleteOptions{Preconditions: &meta.Preconditions{UID: &f.UID}})
+ f.Name, &meta.DeleteOptions{
+ Preconditions: &meta.Preconditions{UID: &f.UID},
+ PropagationPolicy: common.PtrDeletionPropagation(meta.DeletePropagationForeground),
+ })
if deleteErr != nil {
if !apiErrors.IsNotFound(deleteErr) {
return fmt.Errorf(errPfx+"%v", deleteErr)
}
} else {
- if force {
+ if confirm {
// Confirm it is deleted instead of still deleting.
remoteF, getErr := c.fClient.FrameworkcontrollerV1().Frameworks(f.Namespace).Get(
f.Name, meta.GetOptions{})
@@ -1000,8 +1011,8 @@ func (c *FrameworkController) deleteFramework(
}
klog.Infof(
- "[%v]: Succeeded to delete Framework %v: force: %v",
- f.Key(), f.UID, force)
+ "[%v]: Succeeded to delete Framework %v: confirm: %v",
+ f.Key(), f.UID, confirm)
return nil
}
@@ -1012,10 +1023,11 @@ func (c *FrameworkController) deleteFramework(
// Clean up instead of recovery is because the ConfigMapUID is always the ground
// truth.
func (c *FrameworkController) getOrCleanupConfigMap(
- f *ci.Framework, force bool) (cm *core.ConfigMap, err error) {
+ f *ci.Framework, confirm bool) (cm *core.ConfigMap, err error) {
+ logPfx := fmt.Sprintf("[%v]: getOrCleanupConfigMap: ", f.Key())
cmName := f.ConfigMapName()
- if force {
+ if confirm {
cm, err = c.kClient.CoreV1().ConfigMaps(f.Namespace).Get(cmName,
meta.GetOptions{})
} else {
@@ -1026,9 +1038,9 @@ func (c *FrameworkController) getOrCleanupConfigMap(
if apiErrors.IsNotFound(err) {
return nil, nil
} else {
- return nil, fmt.Errorf(
- "[%v]: Failed to get ConfigMap %v: force: %v: %v",
- f.Key(), cmName, force, err)
+ return nil, fmt.Errorf(logPfx+
+ "Failed to get ConfigMap %v: confirm: %v: %v",
+ cmName, confirm, err)
}
}
@@ -1039,11 +1051,18 @@ func (c *FrameworkController) getOrCleanupConfigMap(
// is failed to persist due to FrameworkController restart or create fails
// but succeeds on remote, so clean up the ConfigMap to avoid unmanaged cm
// leak.
- return nil, c.deleteConfigMap(f, cm.UID, force)
+ klog.Warningf(logPfx+
+ "Found unmanaged but controlled ConfigMap, so explicitly delete it: %v, %v",
+ cm.Name, cm.UID)
+ return nil, c.deleteConfigMap(f, cm.UID, confirm)
} else {
// Do not own and manage the life cycle of not controlled object, so still
// consider the get and controlled object clean up is success, and postpone
// the potential naming conflict when creating the controlled object.
+ klog.Warningf(logPfx+
+ "Found unmanaged and uncontrolled ConfigMap, and it may be naming conflict "+
+ "with the controlled ConfigMap to be created: %v, %v",
+ cm.Name, cm.UID)
return nil, nil
}
} else {
@@ -1055,15 +1074,12 @@ func (c *FrameworkController) getOrCleanupConfigMap(
// Using UID to ensure we delete the right object.
// The cmUID should be controlled by f.
func (c *FrameworkController) deleteConfigMap(
- f *ci.Framework, cmUID types.UID, force bool) error {
+ f *ci.Framework, cmUID types.UID, confirm bool) error {
cmName := f.ConfigMapName()
errPfx := fmt.Sprintf(
- "[%v]: Failed to delete ConfigMap %v, %v: force: %v: ",
- f.Key(), cmName, cmUID, force)
+ "[%v]: Failed to delete ConfigMap %v, %v: confirm: %v: ",
+ f.Key(), cmName, cmUID, confirm)
- // Do not set zero GracePeriodSeconds to do force deletion in any case, since
- // it will also immediately delete Pod in PodUnknown state, while the Pod may
- // be still running.
deleteErr := c.kClient.CoreV1().ConfigMaps(f.Namespace).Delete(cmName,
&meta.DeleteOptions{Preconditions: &meta.Preconditions{UID: &cmUID}})
if deleteErr != nil {
@@ -1071,7 +1087,7 @@ func (c *FrameworkController) deleteConfigMap(
return fmt.Errorf(errPfx+"%v", deleteErr)
}
} else {
- if force {
+ if confirm {
// Confirm it is deleted instead of still deleting.
cm, getErr := c.kClient.CoreV1().ConfigMaps(f.Namespace).Get(cmName,
meta.GetOptions{})
@@ -1091,8 +1107,8 @@ func (c *FrameworkController) deleteConfigMap(
}
klog.Infof(
- "[%v]: Succeeded to delete ConfigMap %v, %v: force: %v",
- f.Key(), cmName, cmUID, force)
+ "[%v]: Succeeded to delete ConfigMap %v, %v: confirm: %v",
+ f.Key(), cmName, cmUID, confirm)
return nil
}
@@ -1195,13 +1211,13 @@ func (c *FrameworkController) syncTaskState(
diag := fmt.Sprintf(
"Pod does not appear in the local cache within timeout %v, "+
- "so consider it was deleted and force delete it",
+ "so consider it was deleted and explicitly delete it",
common.SecToDuration(c.cConfig.ObjectLocalCacheCreationTimeoutSec))
klog.Warningf(logPfx + diag)
// Ensure pod is deleted in remote to avoid managed pod leak after
// TaskAttemptCompleted.
- err := c.deletePod(f, taskRoleName, taskIndex, *taskStatus.PodUID(), true)
+ err := c.deletePod(f, taskRoleName, taskIndex, *taskStatus.PodUID(), true, false)
if err != nil {
return err
}
@@ -1230,7 +1246,7 @@ func (c *FrameworkController) syncTaskState(
if taskStatus.State == ci.TaskAttemptDeletionPending {
// The CompletionStatus has been persisted, so it is safe to delete the
// pod now.
- err := c.deletePod(f, taskRoleName, taskIndex, *taskStatus.PodUID(), false)
+ err := c.deletePod(f, taskRoleName, taskIndex, *taskStatus.PodUID(), false, false)
if err != nil {
return err
}
@@ -1315,8 +1331,7 @@ func (c *FrameworkController) syncTaskState(
}
f.TransitionTaskState(taskRoleName, taskIndex, ci.TaskAttemptDeleting)
- klog.Infof(logPfx + "Waiting Pod to be deleted")
- return nil
+ return c.handlePodGracefulDeletion(f, taskRoleName, taskIndex, pod)
}
}
}
@@ -1517,6 +1532,33 @@ func (c *FrameworkController) syncTaskState(
taskStatus.State))
}
+// The pod should be controlled by f's cm.
+func (c *FrameworkController) handlePodGracefulDeletion(
+ f *ci.Framework, taskRoleName string, taskIndex int32, pod *core.Pod) error {
+ logPfx := fmt.Sprintf("[%v][%v][%v]: handlePodGracefulDeletion: ",
+ f.Key(), taskRoleName, taskIndex)
+ taskSpec := f.TaskRoleSpec(taskRoleName).Task
+
+ if pod.DeletionTimestamp == nil {
+ return nil
+ }
+ if taskSpec.PodGracefulDeletionTimeoutSec == nil {
+ klog.Infof(logPfx + "Waiting Pod to be deleted")
+ return nil
+ }
+ if c.enqueuePodGracefulDeletionTimeoutCheck(f, taskRoleName, true, pod) {
+ klog.Infof(logPfx + "Waiting Pod to be deleted or timeout")
+ return nil
+ }
+
+ klog.Warningf(logPfx+
+ "Pod cannot be deleted within timeout %v, so force delete it",
+ common.SecToDuration(taskSpec.PodGracefulDeletionTimeoutSec))
+ // Always confirm the force deletion to expose the failure that even force
+ // deletion cannot delete the Pod, such as the Pod Finalizers is not empty.
+ return c.deletePod(f, taskRoleName, taskIndex, pod.UID, true, true)
+}
+
// Get Task's current Pod object, if not found, then clean up existing
// controlled Pod if any.
// Returned pod is either managed or nil, if it is the managed pod, it is not
@@ -1524,11 +1566,13 @@ func (c *FrameworkController) syncTaskState(
// Clean up instead of recovery is because the PodUID is always the ground truth.
func (c *FrameworkController) getOrCleanupPod(
f *ci.Framework, cm *core.ConfigMap,
- taskRoleName string, taskIndex int32, force bool) (pod *core.Pod, err error) {
+ taskRoleName string, taskIndex int32, confirm bool) (pod *core.Pod, err error) {
+ logPfx := fmt.Sprintf("[%v][%v][%v]: getOrCleanupPod: ",
+ f.Key(), taskRoleName, taskIndex)
taskStatus := f.TaskStatus(taskRoleName, taskIndex)
podName := taskStatus.PodName()
- if force {
+ if confirm {
pod, err = c.kClient.CoreV1().Pods(f.Namespace).Get(podName,
meta.GetOptions{})
} else {
@@ -1539,9 +1583,9 @@ func (c *FrameworkController) getOrCleanupPod(
if apiErrors.IsNotFound(err) {
return nil, nil
} else {
- return nil, fmt.Errorf(
- "[%v][%v][%v]: Failed to get Pod %v: force: %v: %v",
- f.Key(), taskRoleName, taskIndex, podName, force, err)
+ return nil, fmt.Errorf(logPfx+
+ "Failed to get Pod %v: confirm: %v: %v",
+ podName, confirm, err)
}
}
@@ -1551,11 +1595,24 @@ func (c *FrameworkController) getOrCleanupPod(
// The managed Pod becomes unmanaged if and only if Framework.Status
// is failed to persist due to FrameworkController restart or create fails
// but succeeds on remote, so clean up the Pod to avoid unmanaged pod leak.
- return nil, c.deletePod(f, taskRoleName, taskIndex, pod.UID, force)
+ klog.Warningf(logPfx+
+ "Found unmanaged but controlled Pod, so explicitly delete it: %v, %v",
+ pod.Name, pod.UID)
+ if pod.DeletionTimestamp != nil {
+ err = c.handlePodGracefulDeletion(f, taskRoleName, taskIndex, pod)
+ if err != nil {
+ return nil, err
+ }
+ }
+ return nil, c.deletePod(f, taskRoleName, taskIndex, pod.UID, confirm, false)
} else {
// Do not own and manage the life cycle of not controlled object, so still
// consider the get and controlled object clean up is success, and postpone
// the potential naming conflict when creating the controlled object.
+ klog.Warningf(logPfx+
+ "Found unmanaged and uncontrolled Pod, and it may be naming conflict "+
+ "with the controlled Pod to be created: %v, %v",
+ pod.Name, pod.UID)
return nil, nil
}
} else {
@@ -1565,27 +1622,26 @@ func (c *FrameworkController) getOrCleanupPod(
}
// Using UID to ensure we delete the right object.
-// The podUID should be controlled by cm.
+// The podUID should be controlled by f's cm.
func (c *FrameworkController) deletePod(
f *ci.Framework, taskRoleName string, taskIndex int32,
- podUID types.UID, force bool) error {
- taskStatus := f.TaskStatus(taskRoleName, taskIndex)
- podName := taskStatus.PodName()
+ podUID types.UID, confirm bool, force bool) error {
+ podName := f.TaskStatus(taskRoleName, taskIndex).PodName()
errPfx := fmt.Sprintf(
- "[%v][%v][%v]: Failed to delete Pod %v, %v: force: %v: ",
- f.Key(), taskRoleName, taskIndex, podName, podUID, force)
-
- // Do not set zero GracePeriodSeconds to do force deletion in any case, since
- // it will also immediately delete Pod in PodUnknown state, while the Pod may
- // be still running.
- deleteErr := c.kClient.CoreV1().Pods(f.Namespace).Delete(podName,
- &meta.DeleteOptions{Preconditions: &meta.Preconditions{UID: &podUID}})
+ "[%v][%v][%v]: Failed to delete Pod %v, %v: confirm: %v, force: %v: ",
+ f.Key(), taskRoleName, taskIndex, podName, podUID, confirm, force)
+
+ deleteOptions := &meta.DeleteOptions{Preconditions: &meta.Preconditions{UID: &podUID}}
+ if force {
+ deleteOptions.GracePeriodSeconds = common.PtrInt64(0)
+ }
+ deleteErr := c.kClient.CoreV1().Pods(f.Namespace).Delete(podName, deleteOptions)
if deleteErr != nil {
if !apiErrors.IsNotFound(deleteErr) {
return fmt.Errorf(errPfx+"%v", deleteErr)
}
} else {
- if force {
+ if confirm {
// Confirm it is deleted instead of still deleting.
pod, getErr := c.kClient.CoreV1().Pods(f.Namespace).Get(podName,
meta.GetOptions{})
@@ -1605,8 +1661,8 @@ func (c *FrameworkController) deletePod(
}
klog.Infof(
- "[%v][%v][%v]: Succeeded to delete Pod %v, %v: force: %v",
- f.Key(), taskRoleName, taskIndex, podName, podUID, force)
+ "[%v][%v][%v]: Succeeded to delete Pod %v, %v: confirm: %v, force: %v",
+ f.Key(), taskRoleName, taskIndex, podName, podUID, confirm, force)
return nil
}
diff --git a/pkg/internal/utils.go b/pkg/internal/utils.go
index 0b2942dd..c5ce04fe 100644
--- a/pkg/internal/utils.go
+++ b/pkg/internal/utils.go
@@ -38,6 +38,7 @@ import (
"k8s.io/client-go/tools/cache"
"k8s.io/klog"
"reflect"
+ "time"
)
func CreateClients(kConfig *rest.Config) (
@@ -220,3 +221,17 @@ func ToPod(obj interface{}) *core.Pod {
return pod
}
+
+func GetPodDeletionStartTime(pod *core.Pod) *meta.Time {
+ if pod.DeletionTimestamp == nil {
+ return nil
+ }
+
+ var gracePeriod time.Duration
+ if pod.DeletionGracePeriodSeconds == nil {
+ gracePeriod = time.Duration(0)
+ } else {
+ gracePeriod = common.SecToDuration(pod.DeletionGracePeriodSeconds)
+ }
+ return common.PtrTime(meta.NewTime(pod.DeletionTimestamp.Add(-gracePeriod)))
+}