-
-
Notifications
You must be signed in to change notification settings - Fork 10.5k
Description
Your current environment

Model Input Dumps
No response
🐛 Describe the bug
(demo_vllm) demo@dgx03:/raid/xinference/modelscope/hub/qwen/Qwen2-72B-Instruct/logs$ tail -f vllm_20240927.log
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f5cf7ba9897 in /raid/demo/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10d::ProcessGroupNCCL::WorkNCCL::checkTimeout(std::optional<std::chrono::duration<long, std::ratio<1l, 1000l> > >) + 0x1d2 (0x7f5cf8e82c62 in /raid/demo/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #2: c10d::ProcessGroupNCCL::watchdogHandler() + 0x1a0 (0x7f5cf8e87a80 in /raid/demo/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #3: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x10c (0x7f5cf8e88dcc in /raid/demo/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #4: + 0xdbbf4 (0x7f5d44931bf4 in /raid/demo/anaconda3/envs/vllm/bin/../lib/libstdc++.so.6)
frame #5: + 0x8609 (0x7f5d4618b609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #6: clone + 0x43 (0x7f5d45f56353 in /lib/x86_64-linux-gnu/libc.so.6)
/raid/demo/anaconda3/envs/vllm/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.