Skip to content

Multi-node serving with vLLM - Problems with Ray #2406

@vbucaj

Description

@vbucaj

I am trying to run a distributed (multi-node) inference server with vLLM using ray, but I keep getting the following ValueError:

Ray does not allocate any GPUs on the driver node. Consider adjusting the Ray placement group or running the driver on a GPU node.

I'm not sure how exactly to resolve this. I suspect the issue is with this script https://github.com/vllm-project/vllm/blob/main/vllm/engine/ray_utils.py, especially when a ray_address is passed. Is there a specific ray_address arg that gets passed during the ray.init() stage?

More specifically, it seems like this error is raised because of the driver_dummy_worker in line 182 of https://github.com/vllm-project/vllm/blob/main/vllm/engine/llm_engine.py.

I'm confused what's going on with this piece of code

   def _init_workers_ray(self, placement_group: "PlacementGroup",
                          **ray_remote_kwargs):
        if self.parallel_config.tensor_parallel_size == 1:
            num_gpus = self.cache_config.gpu_memory_utilization
        else:
            num_gpus = 1

        self.driver_dummy_worker: RayWorkerVllm = None
        self.workers: List[RayWorkerVllm] = []

        driver_ip = get_ip()
        for bundle_id, bundle in enumerate(placement_group.bundle_specs):
            if not bundle.get("GPU", 0):
                continue
            scheduling_strategy = PlacementGroupSchedulingStrategy(
                placement_group=placement_group,
                placement_group_capture_child_tasks=True,
                placement_group_bundle_index=bundle_id,
            )
            worker = ray.remote(
                num_cpus=0,
                num_gpus=num_gpus,
                scheduling_strategy=scheduling_strategy,
                **ray_remote_kwargs,
            )(RayWorkerVllm).remote(self.model_config.trust_remote_code)

            worker_ip = ray.get(worker.get_node_ip.remote())
            if worker_ip == driver_ip and self.driver_dummy_worker is None:
                # If the worker is on the same node as the driver, we use it
                # as the resource holder for the driver process.
                self.driver_dummy_worker = worker
            else:
                self.workers.append(worker)

        if self.driver_dummy_worker is None:
            raise ValueError(
                "Ray does not allocate any GPUs on the driver node. Consider "
                "adjusting the Ray placement group or running the driver on a "
                "GPU node.")

When the error is raised, it is checking if driver_dummy_worker is None, but don't we set it to None above, that is, self.driver_dummy_worker: RayWorkerVllm =None?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions