You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are multiple cases when the OOM error is not reported in cortex get:
Exit code 137/236/237/350/363/370 as shown in logs, but exit code 0 in container status with reason as OOMKilled and Job marked as Successful.
Exit code 137/236/237/350 as shown in logs, but exit code 0 in container status with reason as OOMKilled and Job marked as Failed.
Pod evicted by k8s engine, with Job marked as Successful, but with pod reason “memory was too low, had to be evicted”-like message.
Reproducibility
Set a very low mem request in the cortex.yaml config and then create a big numpy array in the job.
Submit the job and notice the job status not being set to OOM.
The text was updated successfully, but these errors were encountered:
Description
There are multiple cases when the OOM error is not reported in
cortex get
:reason
as OOMKilled and Job marked as Successful.reason
as OOMKilled and Job marked as Failed.Reproducibility
Set a very low mem request in the
cortex.yaml
config and then create a bignumpy
array in the job.Submit the job and notice the job status not being set to OOM.
The text was updated successfully, but these errors were encountered: