-
Notifications
You must be signed in to change notification settings - Fork 1.3k
[ws-manager] Add missing check to fix OOM error #8372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## main #8372 +/- ##
===========================================
+ Coverage 12.31% 33.30% +20.98%
===========================================
Files 20 31 +11
Lines 1161 4573 +3412
===========================================
+ Hits 143 1523 +1380
- Misses 1014 2934 +1920
- Partials 4 116 +112
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Marked this as draft because I saw other errors not related to pod deletion e.g. finalizer removal |
hm. that was kinda intentional. since if we fail to delete, then we not going to be able to create new one. also I would assume it would fail with delete error, and not out of memory error in this case. |
f34c282
to
a6ba3dd
Compare
If the pod does not exist then we should not be erroring out. Unsure why that would be intentional. You will always fail to delete what is already deleted, right?
In the logs I can see that it says deletion failed and returns the pod ran to completion error. However, in the DB I still see the OOM error. I am not very sure where is that being written from. |
my concern here is why it was deleted already. or which service deleted it |
That is a valid question, and I don't have answer to that. |
Description
We were bailing out on errors when we can retry or safely ignore the error:
1. Pod deletion error - When a pod is already deleted, any attempt to delete it would result in an error. If the error is not found error then it is safe to ignore this error and retry our attempt to create a pod.
2. Finalizer removal error - If a pod gets deleted before we attempt to explicitly remove the finalizer OR the pod object is changed before we attempt to remove the finalizer, then update would fail. In these cases we should check (a) the pod still exists and (b) retry updating it.
Related Issue(s)
🤞🏾 Hopefully Fixes #8238
How to test
We are not able to reproduce this issue in a non prod env so the only way to validate if this fixes the problem is to deploy in prod and monitor.
Release Notes
Documentation