Skip to content

Conversation

jzhoucliqr
Copy link
Contributor

What this PR does / why we need it:

Pod deletion and volume detach happen asynchronously, so pod could be deleted before volume detached from the node
When deleting a machine, this could cause issues for some storage provisioner, for example, vsphere-volume this is problematic because if the node deleted before volume detaching success, then the underline VMDK will be deleted together with the Machine

This PR added a fix to wait for volume detach from the node after node draining.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #4707

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 14, 2021
@k8s-ci-robot
Copy link
Contributor

Hi @jzhoucliqr. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jul 14, 2021
@MaxRink
Copy link
Contributor

MaxRink commented Jul 14, 2021

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 14, 2021
@jzhoucliqr jzhoucliqr force-pushed the wait-volume-detach branch from 5bbf0cf to f2859bb Compare July 14, 2021 18:08
@jzhoucliqr jzhoucliqr force-pushed the wait-volume-detach branch from f2859bb to 2d98f66 Compare July 14, 2021 18:19
@jzhoucliqr jzhoucliqr force-pushed the wait-volume-detach branch from 2d98f66 to 2d0b68a Compare July 14, 2021 18:41
@jzhoucliqr
Copy link
Contributor Author

/test pull-cluster-api-test-main

@jzhoucliqr jzhoucliqr force-pushed the wait-volume-detach branch from 2d0b68a to d3e214f Compare July 15, 2021 02:11
@jzhoucliqr
Copy link
Contributor Author

/test pull-cluster-api-test-main

@jzhoucliqr jzhoucliqr force-pushed the wait-volume-detach branch 2 times, most recently from 4010207 to 6e2491e Compare July 15, 2021 19:32
@vincepri
Copy link
Member

vincepri commented Jul 28, 2021

How do we give some sort of timeout? What if a Volume never gets detached?

Adding this in without a timeout can be considered a behavioral change on deletion, although I do agree that's probably more correct to wait for all volumes to be fully detached I'd like to make sure we have enough agreement on this change.

cc @CecileRobertMichon @randomvariable @yastij

@MaxRink
Copy link
Contributor

MaxRink commented Jul 29, 2021

@vincepri shouldn't nodeDrainTimeoutstill apply?
So i would view it as the same as unfullfillable PDBs: If you don't specified that you might end up with things pending until manual intervention.

@vincepri
Copy link
Member

That's true, I just saw that this code is in the isNodeDrainAllowed block, all good 👍

@jzhoucliqr
Copy link
Contributor Author

/test pull-cluster-api-test-main

Copy link
Member

@vincepri vincepri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 2, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vincepri

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 2, 2021
@k8s-ci-robot k8s-ci-robot merged commit 35ff95d into kubernetes-sigs:master Aug 2, 2021
@k8s-ci-robot k8s-ci-robot added this to the v0.4 milestone Aug 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CAPi doesnt wait for CSI volume unbinding

6 participants