-
Notifications
You must be signed in to change notification settings - Fork 19
Machines not scaling due to ID character limit #182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Machines not scaling due to ID character limit #182
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR looks great, love the name truncation for node and machine pools. I think it might be related to your other PR but I think we could add some error logging in the finalizeScalingDownMachines function so that we can find where the errors are coming from more easily.
I tested this on a rosa hosted/hypershift cluster and all worked as expected, the node pool was scaled up and name updated appropriately.
I also tested it on an OSD cluster but came across some issues with the machine pools not scaling. I didn't find the culprit and think we should test on another osd cluster to see if errors are seen there also.
2023-11-22T15:07:33Z ERROR Reconciler error {"controller": "appwrapper", "controllerGroup": "workload.codeflare.dev", "controllerKind": "AppWrapper", "AppWrapper": {"name":"raycluster-complete","namespace":"default"}, "namespace": "default", "name": "raycluster-complete", "reconcileID": "a9baadbf-6a38-4d62-966b-05c2eaf75f3c", "error": "status is 404, identifier is '404', code is 'CLUSTERS-MGMT-404' and operation identifier is 'b22bd107-f842-4009-bc58-815a8628475e': Cluster 'machine_pools' not found"}
5b00964
to
be477bc
Compare
Thanks for your review and feedback @Fiona-Waters :) I have pushed updates to the logging within the hypershift pr |
You're welcome. Has it been retested on OSD? |
be477bc
to
3780c79
Compare
Hi Fiona, However, I have rebased again with the others for testing purposes :) |
3780c79
to
320e5f0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested it out all good I will drop a lgtm when nodepools is merged good stuff Eoin
…n of machine name character limit
320e5f0
to
4bf437d
Compare
Just successfully ran this while testing the nodepool e2e. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/lgtm |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: astefanutti The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Issue link
closes #143
What changes have been made
This Pr is for 320e5f0 and will be rebased after Hypershift Integration has been merged.
This pr contains an update to the nodepools, and machinepool ID generation convention for instascale. Replacing the appended instance-type with a string of 4 random characters.
The prefix is the appwrapper name (Truncated if over the character limit when appended to the suffix, Nodepools 15, MachinePools 30).
This change also updates the scaledown functions to scaledown machines with a label pointing to the related appwrapper (composed of the appwrapper name, and namespace), as apposed to the machine ID.
These changes also rely on the codeflare-operator e2e tests being updated.
Verification steps
Dispatch an appwrapper (with a long name :) ) on hypershift and OSD with instascale enabled and a specified instance type.
Ensure the machine scales up as expected, and the machine name is truncated to a suitable character length.
Ensure the machine scales down once complete.
Checks