-
Notifications
You must be signed in to change notification settings - Fork 402
Description
My CSI plugin always returns readyToUse=true, because it simply blocks in CreateSnapshot() until the snapshot is created (typically 1 second or less). Usually the volumesnapshot object in k8s reflects readyToUse=true immediately, but with some randomness it sometimes shows up as readyToUse=false, and only get corrected after about a minute.
Here is a log that illustrates this happening:
external-snapshotter.log
Notice at 05:10:00, the CreateSnapshot() RPC returns success with readyToUse=true. However there's an error on line 96 of the log:
snapshot_controller.go:325] error updating volume snapshot content status for snapshot snapcontent-d7f6b159-fd33-4f57-9084-21c9a12a691b: snapshot controller failed to update snapcontent-d7f6b159-fd33-4f57-9084-21c9a12a691b on API server: Operation cannot be fulfilled on volumesnapshotcontents.snapshot.storage.k8s.io "snapcontent-d7f6b159-fd33-4f57-9084-21c9a12a691b": the object has been modified; please apply your changes to the latest version and try again.
48 seconds later, the controller retries, and successfully updates the object.
I have 2 issues with this behavior (1) why was readyToUse ever set to false if the CreateSnapshot() RPC returned readyToUse=true on the first try? And (2) it seems that the long wait time before retrying is unneeded because it's just an API race with something else modifying the same snapshotcontent object. We could just retry the update right after the error, or requeue the operation for very soon after instead of waiting. 48 seconds is a long time to wait in an automated sequence of steps that's waiting for the snapshot to be usable.