Multi-node serving with vLLM - Problems with Ray

I am trying to run a distributed (multi-node) inference server with vLLM using ray, but I keep getting the following ValueError:

`Ray does not allocate any GPUs on the driver node. Consider adjusting the Ray placement group or running the driver on a GPU node.`

I'm not sure how exactly to resolve this. I suspect the issue is with this script https://github.com/vllm-project/vllm/blob/main/vllm/engine/ray_utils.py, especially when a `ray_address` is passed. Is there a specific ray_address arg that gets passed during the ray.init() stage? 

More specifically, it seems like this error is raised because of the `driver_dummy_worker` in line 182 of https://github.com/vllm-project/vllm/blob/main/vllm/engine/llm_engine.py. 

I'm confused what's going on with this piece of code 


```
   def _init_workers_ray(self, placement_group: "PlacementGroup",
                          **ray_remote_kwargs):
        if self.parallel_config.tensor_parallel_size == 1:
            num_gpus = self.cache_config.gpu_memory_utilization
        else:
            num_gpus = 1

        self.driver_dummy_worker: RayWorkerVllm = None
        self.workers: List[RayWorkerVllm] = []

        driver_ip = get_ip()
        for bundle_id, bundle in enumerate(placement_group.bundle_specs):
            if not bundle.get("GPU", 0):
                continue
            scheduling_strategy = PlacementGroupSchedulingStrategy(
                placement_group=placement_group,
                placement_group_capture_child_tasks=True,
                placement_group_bundle_index=bundle_id,
            )
            worker = ray.remote(
                num_cpus=0,
                num_gpus=num_gpus,
                scheduling_strategy=scheduling_strategy,
                **ray_remote_kwargs,
            )(RayWorkerVllm).remote(self.model_config.trust_remote_code)

            worker_ip = ray.get(worker.get_node_ip.remote())
            if worker_ip == driver_ip and self.driver_dummy_worker is None:
                # If the worker is on the same node as the driver, we use it
                # as the resource holder for the driver process.
                self.driver_dummy_worker = worker
            else:
                self.workers.append(worker)

        if self.driver_dummy_worker is None:
            raise ValueError(
                "Ray does not allocate any GPUs on the driver node. Consider "
                "adjusting the Ray placement group or running the driver on a "
                "GPU node.")
```
When the error is raised, it is checking  if `driver_dummy_worker` is None, but don't we set it to `None` above, that is, `self.driver_dummy_worker: RayWorkerVllm =None`? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Multi-node serving with vLLM - Problems with Ray #2406

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Multi-node serving with vLLM - Problems with Ray #2406

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions