Skip to content

[UX] Allow resources.gpus in SkyPilot YAML to alias resources.accelerators #5207

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

BorenTsai
Copy link
Contributor

@BorenTsai BorenTsai commented Apr 14, 2025

closes: #5194

Introduce yaml config aliasing for top-level fields: "gpus" can stand in for "accelerators." Nested configurations are not supported.

Unlike the cli behavior, "gpus" does not override "accelerators." Either the canonical or the alias can be used otherwise and InvalidSkyPilotConfigError is raised.

A nice follow-up here would be to use pydantic as it offers schema validation and aliasing

Added unittests to test_yaml_parser.py

  • verify that gpus is parsed and added to the compiled resources.accelerators
  • verify that gpus overrides accelerators when specified
Screenshot 2025-04-14 at 4 46 09 PM Screenshot 2025-04-14 at 4 48 40 PM

Tested (run the relevant ones):

  • Code formatting: install pre-commit (auto-check on commit) or bash format.sh
  • Any manual or new tests for this PR (please specify below)
  • All smoke tests: /smoke-test (CI) or pytest tests/test_smoke.py (local)
  • Relevant individual tests: /smoke-test -k test_name (CI) or pytest tests/test_smoke.py::test_name (local)
  • Backward compatibility: /quicktest-core (CI) or pytest tests/smoke_tests/test_backward_compat.py (local)

@BorenTsai BorenTsai force-pushed the nicky/yaml-config-gpus-alias branch from a201c0f to 1f93332 Compare April 14, 2025 20:58
@BorenTsai BorenTsai force-pushed the nicky/yaml-config-gpus-alias branch from 1f93332 to c920e69 Compare April 14, 2025 21:10
@SeungjinYang
Copy link
Collaborator

SeungjinYang commented Apr 14, 2025

Nice! Do you have an expected behavior in mind if both gpus and accelerators are specified in yaml? My gut feeling says such yaml should be rejected (just because that's simpler to understand than trying to define a precedence / merging behavior) - in any ways, we should come up and enforce an expected behavior.

Oh nvm, you already have a behavior specified.

@BorenTsai
Copy link
Contributor Author

Nice! Do you have an expected behavior in mind if both gpus and accelerators are specified in yaml? My gut feeling says such yaml should be rejected (just because that's simpler to understand than trying to define a precedence / merging behavior) - in any ways, we should come up and enforce an expected behavior.

Oh nvm, you already have a behavior specified.

haha, yeah, I figured applying the cli overriding behavior here would also be ok but I agree that it would be confusing as there's no logging or documentation surfaces this behavior.

Totally happy to change this to an either/or situation which surfaces an error otherwise!

@SeungjinYang
Copy link
Collaborator

Yeah - I think enforcing either-or behavior on aliases (not just for this specific case, but in general) make sense because it's not really a separate config field, just another name to give to the field. Let's error out if both the original and aliased fields exist.

@BorenTsai BorenTsai changed the title [UX] Allow resources.gpus in SkyPilot YAML to be aligned with CLI [UX] Allow resources.gpus in SkyPilot YAML to alias resources.accelerators Apr 14, 2025
@SeungjinYang
Copy link
Collaborator

Nice!

I was thinking - now that you're using Dict[str, Any] which can be trivially casted into sky/utils/config_utils.py:Config, we may actually be able to implement nested aliases with very little additional effort. Take a look at:

  • get_nested and set_nested and pop_nested methods defined in sky/utils/config_utils.py:Config class
  • sky/skylet/constants.py:OVERRIDEABLE_CONFIG_KEYS_IN_TASK

which should provide the nested equivalents of the map operations you are using to implement top-level config alias.

@BorenTsai
Copy link
Contributor Author

@SeungjinYang thanks for the code pointer! Took a look at the methods you mentioned and it does seem pretty easy utilize the config_utils.Config methods.

One concern I have here is that in the get_nested method we perform a deepcopy:

config = copy.deepcopy(self)

How large do these resource configuration maps become? I'm worried about the overhead of performing 2 deep copies for each present alias (1 deepcopy when popping the alias value, another deepcopy to see if the canonical value is present)

Copy link
Collaborator

@SeungjinYang SeungjinYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried about the overhead of performing 2 deep copies for each present alias

Yeah, that's actually a good point. Supporting nested aliases is not needed right now and there's no need to prematurely optimize. How about we just put a comment somewhere on how nested aliases could be implemented if there is a need in the future, and noting the potential performance implications?

Copy link
Collaborator

@SeungjinYang SeungjinYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise! We can get the PR merged once the outstanding comments are addressed.

@BorenTsai
Copy link
Contributor Author

I'm worried about the overhead of performing 2 deep copies for each present alias

Yeah, that's actually a good point. Supporting nested aliases is not needed right now and there's no need to prematurely optimize. How about we just put a comment somewhere on how nested aliases could be implemented if there is a need in the future, and noting the potential performance implications?

Sounds good! For future reference, the implementation might look something like the following assuming we have a ResourceAlias dataclass

    @staticmethod
    def resolve_resource_config_aliases(
            config: Optional[Dict[str, Any]]) -> Optional[config_utils.Config]:
        """Resolves and applies aliases to config."""
        if not config:
            return config

        sky_config = config_utils.Config.from_dict(config)
        for resource_alias in RESOURCE_CONFIG_ALIASES:
            # Pop such that the alias value is removed from the config.
            # This avoids violating the resource config schema.
            alias_value = sky_config.pop_nested(resource_alias.alias_path, {})
            if alias_value:
                canonical_value = sky_config.get_nested(
                    resource_alias.canonical_path, {})
                if alias_value and canonical_value:
                    raise exceptions.InvalidSkyPilotConfigError(
                        f'Cannot specify both {resource_alias.alias_path} '
                        f'and {resource_alias.canonical_path} in config.')

                sky_config.set_nested(resource_alias.canonical_path,
                                      alias_value)

        return sky_config

And the test

def test_nested_aliases(tmp_path):
    """Validate nested aliases can be resolved."""
    mock_aliases = mock.patch('sky.resources.RESOURCE_CONFIG_ALIASES', [
        ResourceAlias(('job_recovery', 'tactic'), ('job_recovery', 'strategy'))
    ])
    mock_aliases.start()

    config_path = _create_config_file(
        textwrap.dedent("""\
            resources:
                job_recovery:
                    tactic: FAILOVER
            """), tmp_path)

    task = Task.from_yaml(config_path)
    resources = list(task.resources)[0]
    job_recovery = resources.job_recovery
    assert isinstance(job_recovery, dict)
    assert job_recovery.get('tactic') is None
    assert job_recovery.get('strategy') == "FAILOVER"

    mock_aliases.stop()

@BorenTsai BorenTsai force-pushed the nicky/yaml-config-gpus-alias branch from a374de3 to c069bef Compare April 16, 2025 04:00
@SeungjinYang
Copy link
Collaborator

issue #5236 to track implementation of nested aliasing for posterity

@SeungjinYang SeungjinYang merged commit d75ee5d into skypilot-org:master Apr 16, 2025
21 checks passed
Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @BorenTsai for fixing this! There are two minor issues below that we should fix, cc'ing @SeungjinYang

@SeungjinYang
Copy link
Collaborator

SeungjinYang commented Apr 16, 2025

@BorenTsai let me know if you want to take on the follow-up items, otherwise I'll take care of them on Friday (edit: I'll take care of them)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[UX] Allow resources.gpus in SkyPilot YAML to be aligned with CLI
3 participants