-
-
Notifications
You must be signed in to change notification settings - Fork 10.5k
Closed
Labels
bugSomething isn't workingSomething isn't workingrayanything related with rayanything related with ray
Description
Your current environment
The output of `python collect_env.py`
🐛 Describe the bug
We should improve the way we create the default placement group in
vllm/vllm/executor/ray_utils.py
Lines 119 to 131 in cbbc904
| num_devices_in_cluster = ray.cluster_resources().get(device_str, 0) | |
| if parallel_config.world_size > num_devices_in_cluster: | |
| raise ValueError( | |
| f"The number of required {device_str}s exceeds the total " | |
| f"number of available {device_str}s in the placement group.") | |
| # Create a new placement group | |
| placement_group_specs = ([{ | |
| device_str: 1 | |
| }] * parallel_config.world_size) | |
| current_placement_group = ray.util.placement_group( | |
| placement_group_specs) | |
| # Wait until PG is ready - this will block until all | |
| # requested resources are available, and will timeout |
- If we are in a placement group, but it does not contain the current node, error out. (this is a rare case, users usually don't set placement groups)
- If not, we are creating a placement group. Make sure it contains the current node.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingrayanything related with rayanything related with ray
Type
Projects
Status
Done