You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If a node is removed from the cluster that was previously flagged as unhealthy, we keep subtracting the unhealthy node resources from the lending limit(s) on the slack quota queue forever.
We need to account for deleted nodes and properly prune the cached node information in the node health monitor.
For now, we can work around the issue by restarting the controller after removing an unhealthy node.
The text was updated successfully, but these errors were encountered:
1. Split node monitoring into two reconcilers, one to monitor Nodes
and one to monitor and update the designated slack ClusterQueue.
2. Remove entries from in memory caches when a Node is deleted.
3. Watch slack cluster queue to be able to react to changes in
nominalQuotas and adjust lendingLimits accordingly.
Fixesproject-codeflare#252.
1. Split node monitoring into two reconcilers, one to monitor Nodes
and one to monitor and update the designated slack ClusterQueue.
2. Remove entries from in memory caches when a Node is deleted.
3. Watch slack cluster queue to be able to react to changes in
nominalQuotas and adjust lendingLimits accordingly.
Fixesproject-codeflare#252.
dgrove-oss
added a commit
to dgrove-oss/appwrapper
that referenced
this issue
Oct 15, 2024
1. Split node monitoring into two reconcilers, one to monitor Nodes
and one to monitor and update the designated slack ClusterQueue.
2. Remove entries from in memory caches when a Node is deleted.
3. Watch slack cluster queue to be able to react to changes in
nominalQuotas and adjust lendingLimits accordingly.
Fixesproject-codeflare#252.
Uh oh!
There was an error while loading. Please reload this page.
If a node is removed from the cluster that was previously flagged as unhealthy, we keep subtracting the unhealthy node resources from the lending limit(s) on the slack quota queue forever.
We need to account for deleted nodes and properly prune the cached node information in the node health monitor.
For now, we can work around the issue by restarting the controller after removing an unhealthy node.
The text was updated successfully, but these errors were encountered: