-
-
Notifications
You must be signed in to change notification settings - Fork 10.4k
Description
Your current environment
🐛 Describe the bug
When running vllm.LLM.generate
with n > 1
, the tqdm progress bar shows an incorrect total count. Specifically, the denominator (total steps) does not correctly account for the fact that multiple generations are being produced per prompt.
For example, when using the following code:
from vllm import LLM, SamplingParams
prompts = [
"Hello, my name is",
"The president of the United States is",
"The capital of France is",
"The future of AI is",
]
sampling_params = SamplingParams(temperature=0.8, top_p=0.95, n=10)
llm = LLM(model="facebook/opt-125m")
outputs = llm.generate(prompts, sampling_params)
The progress bar output looks like this:
Processed prompts: 10%|███ | 4/40 [00:00<00:01, 31.61it/s, est. speed input: 205.53 toks/s, output: 5058.95 toks/s]
Here, n=10
, but the denominator in the progress bar only considers the number of prompts (4
in this case) rather than the total number of generated outputs (4 * 10 = 40
). This makes the progress bar misleading.
Expected Behavior:
The progress bar should reflect the actual number of outputs being processed, considering both the number of prompts and n
.
Environment:
- vLLM version:
v0.7.3
Steps to Reproduce:
- Run the above code with
n > 1
. - Observe that the tqdm progress bar does not match the expected number of generated outputs.
Proposed Fix:
Adjust the total count in tqdm to account for n * len(prompts)
, ensuring that the progress bar correctly reflects the actual workload.
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.