-
-
Notifications
You must be signed in to change notification settings - Fork 10.4k
Closed
Description
python3 api_server.py --model /hbox2dir/chatglm2-6b-32k --trust-remote-code --host 0.0.0.0 --port 7070 --tensor-parallel-size 2
2023-11-20 09:55:13,313 INFO worker.py:1642 -- Started a local Ray instance.
INFO: Started server process [278296]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:7070 (Press CTRL+C to quit)
(RayWorker pid=281502) [2023-11-20 09:55:58,328 E 281502 281502] logging.cc:97: Unhandled exception: N3c105ErrorE. what(): CUDA error: an illegal memory access was encountered
(RayWorker pid=281502) CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(RayWorker pid=281502) For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
(RayWorker pid=281502) Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
(RayWorker pid=281502)
(RayWorker pid=281502) Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:44 (most recent call first):
(RayWorker pid=281502) frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f5026eea617 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/torch/lib/libc10.so)
(RayWorker pid=281502) frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f5026ea598d in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/torch/lib/libc10.so)
(RayWorker pid=281502) frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7f5026fa59f8 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
(RayWorker pid=281502) frame #3: <unknown function> + 0x16746 (0x7f5026f6e746 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
(RayWorker pid=281502) frame #4: <unknown function> + 0x1947d (0x7f5026f7147d in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
(RayWorker pid=281502) frame #5: <unknown function> + 0x1989d (0x7f5026f7189d in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
(RayWorker pid=281502) frame #6: <unknown function> + 0x510c46 (0x7f4faf33fc46 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
(RayWorker pid=281502) frame #7: <unknown function> + 0x55ca7 (0x7f5026ecfca7 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/torch/lib/libc10.so)
(RayWorker pid=281502) frame #8: c10::TensorImpl::~TensorImpl() + 0x1e3 (0x7f5026ec7cb3 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/torch/lib/libc10.so)
(RayWorker pid=281502) frame #9: c10::TensorImpl::~TensorImpl() + 0x9 (0x7f5026ec7e49 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/torch/lib/libc10.so)
(RayWorker pid=281502) frame #10: <unknown function> + 0x7c1708 (0x7f4faf5f0708 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
(RayWorker pid=281502) frame #11: THPVariable_subclass_dealloc(_object*) + 0x325 (0x7f4faf5f0ab5 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
(RayWorker pid=281502) frame #12: ray::RayWorker.execute_method() [0x4e0970]
(RayWorker pid=281502) frame #13: ray::RayWorker.execute_method() [0x4f1828]
(RayWorker pid=281502) frame #14: ray::RayWorker.execute_method() [0x4f1811]
(RayWorker pid=281502) frame #15: ray::RayWorker.execute_method() [0x4f1811]
(RayWorker pid=281502) frame #16: ray::RayWorker.execute_method() [0x4f1811]
(RayWorker pid=281502) frame #17: ray::RayWorker.execute_method() [0x4f1811]
(RayWorker pid=281502) frame #18: ray::RayWorker.execute_method() [0x4f1811]
(RayWorker pid=281502) frame #19: ray::RayWorker.execute_method() [0x4f1811]
(RayWorker pid=281502) frame #20: ray::RayWorker.execute_method() [0x4f1811]
(RayWorker pid=281502) frame #21: ray::RayWorker.execute_method() [0x4f1811]
(RayWorker pid=281502) frame #22: ray::RayWorker.execute_method() [0x4f1811]
(RayWorker pid=281502) frame #23: ray::RayWorker.execute_method() [0x4f1811]
(RayWorker pid=281502) frame #24: ray::RayWorker.execute_method() [0x4f1811]
(RayWorker pid=281502) frame #25: ray::RayWorker.execute_method() [0x4f1811]
(RayWorker pid=281502) frame #26: ray::RayWorker.execute_method() [0x4f1811]
(RayWorker pid=281502) frame #27: ray::RayWorker.execute_method() [0x4f1811]
(RayWorker pid=281502) frame #28: ray::RayWorker.execute_method() [0x4f1811]
(RayWorker pid=281502) frame #29: ray::RayWorker.execute_method() [0x4f1811]
(RayWorker pid=281502) frame #30: ray::RayWorker.execute_method() [0x4f1811]
(RayWorker pid=281502) frame #31: ray::RayWorker.execute_method() [0x4f1811]
(RayWorker pid=281502) frame #32: ray::RayWorker.execute_method() [0x4f1811]
(RayWorker pid=281502) frame #33: ray::RayWorker.execute_method() [0x4f1811]
(RayWorker pid=281502) frame #34: <unknown function> + 0x644015 (0x7f503140e015 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/ray/_raylet.so)
(RayWorker pid=281502) frame #35: std::_Function_handler<ray::Status (ray::rpc::Address const&, ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::string const&, std::string const&, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*, std::vector<std::pair<ray::ObjectID, bool>, std::allocator<std::pair<ray::ObjectID, bool> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::string*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string, bool, bool, bool), ray::Status (*)(ray::rpc::Address const&, ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::string, std::string, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*, std::vector<std::pair<ray::ObjectID, bool>, std::allocator<std::pair<ray::ObjectID, bool> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::string*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string, bool, bool, bool)>::_M_invoke(std::_Any_data const&, ray::rpc::Address const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::string const&, std::string const&, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*&&, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*&&, std::vector<std::pair<ray::ObjectID, bool>, std::allocator<std::pair<ray::ObjectID, bool> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::string*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&, bool&&, bool&&, bool&&) + 0x157 (0x7f503134a547 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/ray/_raylet.so)
(RayWorker pid=281502) frame #36: ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*, std::vector<std::pair<ray::ObjectID, bool>, std::allocator<std::pair<ray::ObjectID, bool> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*, std::string*) + 0xc1e (0x7f5031534e5e in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/ray/_raylet.so)
(RayWorker pid=281502) frame #37: std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*, std::vector<std::pair<ray::ObjectID, bool>, std::allocator<std::pair<ray::ObjectID, bool> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*, std::string*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>, std::_Placeholder<6>, std::_Placeholder<7>, std::_Placeholder<8>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*, std::vector<std::pair<ray::ObjectID, bool>, std::allocator<std::pair<ray::ObjectID, bool> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*, std::string*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*&&, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*&&, std::vector<std::pair<ray::ObjectID, bool>, std::allocator<std::pair<ray::ObjectID, bool> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&, std::string*&&) + 0x58 (0x7f50314697d8 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/ray/_raylet.so)
(RayWorker pid=281502) frame #38: <unknown function> + 0x793684 (0x7f503155d684 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/ray/_raylet.so)
(RayWorker pid=281502) frame #39: <unknown function> + 0x79498a (0x7f503155e98a in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/ray/_raylet.so)
(RayWorker pid=281502) frame #40: <unknown function> + 0x7ac04e (0x7f503157604e in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/ray/_raylet.so)
(RayWorker pid=281502) frame #41: ray::core::ActorSchedulingQueue::AcceptRequestOrRejectIfCanceled(ray::TaskID, ray::core::InboundRequest&) + 0x10c (0x7f503157735c in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/ray/_raylet.so)
(RayWorker pid=281502) frame #42: <unknown function> + 0x7b02cb (0x7f503157a2cb in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/ray/_raylet.so)
(RayWorker pid=281502) frame #43: ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status const&, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&) + 0x400 (0x7f503157bda0 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/ray/_raylet.so)
(RayWorker pid=281502) frame #44: ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>) + 0x1216 (0x7f503155d016 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/ray/_raylet.so)
(RayWorker pid=281502) frame #45: <unknown function> + 0x735e25 (0x7f50314ffe25 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/ray/_raylet.so)
(RayWorker pid=281502) frame #46: <unknown function> + 0xa59886 (0x7f5031823886 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/ray/_raylet.so)
(RayWorker pid=281502) frame #47: <unknown function> + 0xa4b55e (0x7f503181555e in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/ray/_raylet.so)
(RayWorker pid=281502) frame #48: <unknown function> + 0xa4bab6 (0x7f5031815ab6 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/ray/_raylet.so)
(RayWorker pid=281502) frame #49: <unknown function> + 0x102fdbb (0x7f5031df9dbb in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/ray/_raylet.so)
(RayWorker pid=281502) frame #50: <unknown function> + 0x1031d99 (0x7f5031dfbd99 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/ray/_raylet.so)
(RayWorker pid=281502) frame #51: <unknown function> + 0x10324a2 (0x7f5031dfc4a2 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/ray/_raylet.so)
(RayWorker pid=281502) frame #52: ray::core::CoreWorker::RunTaskExecutionLoop() + 0x1c (0x7f50314fea8c in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/ray/_raylet.so)
(RayWorker pid=281502) frame #53: ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop() + 0x8c (0x7f503154025c in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/ray/_raylet.so)
(RayWorker pid=281502) frame #54: ray::core::CoreWorkerProcess::RunTaskExecutionLoop() + 0x1d (0x7f503154040d in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/ray/_raylet.so)
(RayWorker pid=281502) frame #55: <unknown function> + 0x57b5d7 (0x7f50313455d7 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/ray/_raylet.so)
(RayWorker pid=281502) frame #56: ray::RayWorker.execute_method() [0x4ecb84]
(RayWorker pid=281502) frame #57: _PyEval_EvalFrameDefault + 0x6b2 (0x4d87c2 in ray::RayWorker.execute_method)
(RayWorker pid=281502) frame #58: _PyFunction_Vectorcall + 0x106 (0x4e81a6 in ray::RayWorker.execute_method)
(RayWorker pid=281502) frame #59: _PyEval_EvalFrameDefault + 0x6b2 (0x4d87c2 in ray::RayWorker.execute_method)
(RayWorker pid=281502) frame #60: _PyEval_EvalCodeWithName + 0x2f1 (0x4d70d1 in ray::RayWorker.execute_method)
(RayWorker pid=281502) frame #61: PyEval_EvalCodeEx + 0x39 (0x585e29 in ray::RayWorker.execute_method)
(RayWorker pid=281502) frame #62: PyEval_EvalCode + 0x1b (0x585deb in ray::RayWorker.execute_method)
(RayWorker pid=281502) frame #63: ray::RayWorker.execute_method() [0x5a5bd1]
(RayWorker pid=281502)
(RayWorker pid=281502) [E ProcessGroupNCCL.cpp:915] [Rank 1] NCCL watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
(RayWorker pid=281502) CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(RayWorker pid=281502) For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
(RayWorker pid=281502) Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
(RayWorker pid=281502)
(RayWorker pid=281502) Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:44 (most recent call first):
(RayWorker pid=281502) frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f5026eea617 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/torch/lib/libc10.so)
(RayWorker pid=281502) frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f5026ea598d in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/torch/lib/libc10.so)
(RayWorker pid=281502) frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7f5026fa59f8 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
(RayWorker pid=281502) frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x80 (0x7f44659ddaf0 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so)
(RayWorker pid=281502) frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x58 (0x7f44659e1918 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so)
(RayWorker pid=281502) frame #5: c10d::ProcessGroupNCCL::workCleanupLoop() + 0x24b (0x7f44659f815b in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so)
(RayWorker pid=281502) frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x78 (0x7f44659f8468 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so)
(RayWorker pid=281502) frame #7: <unknown function> + 0xdbbf4 (0x7f5030c81bf4 in /root/miniconda3/envs/vllm/bin/../lib/libstdc++.so.6)
(RayWorker pid=281502) frame #8: <unknown function> + 0x8609 (0x7f5032c5a609 in /lib/x86_64-linux-gnu/libpthread.so.0)
(RayWorker pid=281502) frame #9: clone + 0x43 (0x7f5032a25133 in /lib/x86_64-linux-gnu/libc.so.6)
(RayWorker pid=281502)
(RayWorker pid=281502) [2023-11-20 09:55:58,356 E 281502 281738] logging.cc:97: Unhandled exception: St13runtime_error. what(): [Rank 1] NCCL watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
(RayWorker pid=281502) CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(RayWorker pid=281502) For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
(RayWorker pid=281502) Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
(RayWorker pid=281502)
(RayWorker pid=281502) Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:44 (most recent call first):
(RayWorker pid=281502) frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f5026eea617 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/torch/lib/libc10.so)
(RayWorker pid=281502) frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f5026ea598d in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/torch/lib/libc10.so)
(RayWorker pid=281502) frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7f5026fa59f8 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
(RayWorker pid=281502) frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x80 (0x7f44659ddaf0 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so)
(RayWorker pid=281502) frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x58 (0x7f44659e1918 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so)
(RayWorker pid=281502) frame #5: c10d::ProcessGroupNCCL::workCleanupLoop() + 0x24b (0x7f44659f815b in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so)
(RayWorker pid=281502) frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x78 (0x7f44659f8468 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so)
(RayWorker pid=281502) frame #7: <unknown function> + 0xdbbf4 (0x7f5030c81bf4 in /root/miniconda3/envs/vllm/bin/../lib/libstdc++.so.6)
(RayWorker pid=281502) frame #8: <unknown function> + 0x8609 (0x7f5032c5a609 in /lib/x86_64-linux-gnu/libpthread.so.0)
(RayWorker pid=281502) frame #9: clone + 0x43 (0x7f5032a25133 in /lib/x86_64-linux-gnu/libc.so.6)
(RayWorker pid=281502)
(RayWorker pid=281502) [2023-11-20 09:55:58,369 E 281502 281738] logging.cc:104: Stack trace:
(RayWorker pid=281502) /root/miniconda3/envs/vllm/lib/python3.8/site-packages/ray/_raylet.so(+0xf2e81a) [0x7f5031cf881a] ray::operator<<()
(RayWorker pid=281502) /root/miniconda3/envs/vllm/lib/python3.8/site-packages/ray/_raylet.so(+0xf30fd8) [0x7f5031cfafd8] ray::TerminateHandler()
(RayWorker pid=281502) /root/miniconda3/envs/vllm/bin/../lib/libstdc++.so.6(+0xb135a) [0x7f5030c5735a] __cxxabiv1::__terminate()
(RayWorker pid=281502) /root/miniconda3/envs/vllm/bin/../lib/libstdc++.so.6(+0xb13c5) [0x7f5030c573c5]
(RayWorker pid=281502) /root/miniconda3/envs/vllm/bin/../lib/libstdc++.so.6(+0xb134f) [0x7f5030c5734f]
(RayWorker pid=281502) /root/miniconda3/envs/vllm/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so(+0xc86dc5) [0x7f4465763dc5] c10d::ProcessGroupNCCL::ncclCommWatchdog()
(RayWorker pid=281502) /root/miniconda3/envs/vllm/bin/../lib/libstdc++.so.6(+0xdbbf4) [0x7f5030c81bf4] execute_native_thread_routine
(RayWorker pid=281502) /lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7f5032c5a609] start_thread
(RayWorker pid=281502) /lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7f5032a25133] __clone
(RayWorker pid=281502)
(RayWorker pid=281502) *** SIGABRT received at time=1700474158 on cpu 37 ***
(RayWorker pid=281502) PC: @ 0x7f503294900b (unknown) raise
(RayWorker pid=281502) @ 0x7f5032c66420 4048 (unknown)
(RayWorker pid=281502) @ 0x7f5030c5735a (unknown) __cxxabiv1::__terminate()
(RayWorker pid=281502) @ 0x7f5030c57070 (unknown) (unknown)
(RayWorker pid=281502) [2023-11-20 09:55:58,370 E 281502 281738] logging.cc:361: *** SIGABRT received at time=1700474158 on cpu 37 ***
(RayWorker pid=281502) [2023-11-20 09:55:58,370 E 281502 281738] logging.cc:361: PC: @ 0x7f503294900b (unknown) raise
(RayWorker pid=281502) [2023-11-20 09:55:58,370 E 281502 281738] logging.cc:361: @ 0x7f5032c66420 4048 (unknown)
(RayWorker pid=281502) [2023-11-20 09:55:58,370 E 281502 281738] logging.cc:361: @ 0x7f5030c5735a (unknown) __cxxabiv1::__terminate()
(RayWorker pid=281502) [2023-11-20 09:55:58,370 E 281502 281738] logging.cc:361: @ 0x7f5030c57070 (unknown) (unknown)
(RayWorker pid=281502) Fatal Python error: Aborted
(RayWorker pid=281502)
2023-11-20 09:55:59,311 WARNING worker.py:2058 -- A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffffcfd63a7a512721ad3a92ab3f01000000 Worker ID: 759bd1d9c4374681899225cd8a5c6cecd4f171ef93854ae32ad3913d Node ID: 6509b19bea0fda8e704988e22d2253a24f7c73b62f35af462915cc7a Worker IP address: 10.178.166.6 Worker port: 35669 Worker PID: 281501 Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors.
ERROR:asyncio:Exception in callback functools.partial(<function _raise_exception_on_finish at 0x7ff838639940>, request_tracker=<vllm.engine.async_llm_engine.RequestTracker object at 0x7ff7b41c2820>)
handle: <Handle functools.partial(<function _raise_exception_on_finish at 0x7ff838639940>, request_tracker=<vllm.engine.async_llm_engine.RequestTracker object at 0x7ff7b41c2820>)>
Traceback (most recent call last):
File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/vllm/engine/async_llm_engine.py", line 28, in _raise_exception_on_finish
task.result()
File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/vllm/engine/async_llm_engine.py", line 350, in run_engine_loop
has_requests_in_progress = await self.engine_step()
File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/vllm/engine/async_llm_engine.py", line 329, in engine_step
request_outputs = await self.engine.step_async()
File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/vllm/engine/async_llm_engine.py", line 191, in step_async
output = await self._run_workers_async(
File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/vllm/engine/async_llm_engine.py", line 219, in _run_workers_async
all_outputs = await asyncio.gather(*coros)
File "/root/miniconda3/envs/vllm/lib/python3.8/asyncio/tasks.py", line 695, in _wrap_awaitable
return (yield from awaitable.__await__())
ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task.
class_name: RayWorker
actor_id: cfd63a7a512721ad3a92ab3f01000000
pid: 281501
namespace: 10668c9a-16f3-4e63-88cf-73ed6147602d
ip: 10.178.166.6
The actor is dead because its worker process has died. Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run
File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/vllm/engine/async_llm_engine.py", line 37, in _raise_exception_on_finish
raise exc
File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/vllm/engine/async_llm_engine.py", line 32, in _raise_exception_on_finish
raise AsyncEngineDeadError(
vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for the actual cause.
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/vllm/engine/async_llm_engine.py", line 28, in _raise_exception_on_finish
task.result()
File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/vllm/engine/async_llm_engine.py", line 350, in run_engine_loop
has_requests_in_progress = await self.engine_step()
File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/vllm/engine/async_llm_engine.py", line 329, in engine_step
request_outputs = await self.engine.step_async()
File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/vllm/engine/async_llm_engine.py", line 191, in step_async
output = await self._run_workers_async(
File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/vllm/engine/async_llm_engine.py", line 219, in _run_workers_async
all_outputs = await asyncio.gather(*coros)
File "/root/miniconda3/envs/vllm/lib/python3.8/asyncio/tasks.py", line 695, in _wrap_awaitable
return (yield from awaitable.__await__())
ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task.
class_name: RayWorker
actor_id: cfd63a7a512721ad3a92ab3f01000000
pid: 281501
namespace: 10668c9a-16f3-4e63-88cf-73ed6147602d
ip: 10.178.166.6
The actor is dead because its worker process has died. Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 426, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
return await self.app(scope, receive, send)
File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/fastapi/applications.py", line 292, in __call__
await super().__call__(scope, receive, send)
File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/starlette/applications.py", line 122, in __call__
await self.middleware_stack(scope, receive, send)
File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/starlette/middleware/errors.py", line 184, in __call__
raise exc
File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/starlette/middleware/errors.py", line 162, in __call__
await self.app(scope, receive, _send)
File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/starlette/middleware/cors.py", line 83, in __call__
await self.app(scope, receive, send)
File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
raise exc
File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
await self.app(scope, receive, sender)
File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/fastapi/middleware/asyncexitstack.py", line 20, in __call__
raise e
File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/fastapi/middleware/asyncexitstack.py", line 17, in __call__
await self.app(scope, receive, send)
File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/starlette/routing.py", line 718, in __call__
await route.handle(scope, receive, send)
File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/starlette/routing.py", line 276, in handle
await self.app(scope, receive, send)
File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/starlette/routing.py", line 66, in app
response = await func(request)
File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/fastapi/routing.py", line 273, in app
raw_response = await run_endpoint_function(
File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/fastapi/routing.py", line 190, in run_endpoint_function
return await dependant.call(**values)
File "api_server.py", line 523, in create_completion
async for res in result_generator:
File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/vllm/engine/async_llm_engine.py", line 435, in generate
raise e
File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/vllm/engine/async_llm_engine.py", line 429, in generate
async for request_output in stream:
File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/vllm/engine/async_llm_engine.py", line 70, in __anext__
raise result
File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run
File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/vllm/engine/async_llm_engine.py", line 37, in _raise_exception_on_finish
raise exc
File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/vllm/engine/async_llm_engine.py", line 32, in _raise_exception_on_finish
raise AsyncEngineDeadError(
vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for the actual cause.
2023-11-20 09:56:06,456 WARNING worker.py:2058 -- A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffff76970464e1639f53b2da3a3e01000000 Worker ID: c3af2451891da03970065e8a4a1faf9c7bcc1b6c181be98b922f6158 Node ID: 6509b19bea0fda8e704988e22d2253a24f7c73b62f35af462915cc7a Worker IP address: 10.178.166.6 Worker port: 39467 Worker PID: 281502 Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors.
Strangely, the inference process fails even on 8 GPUs, whereas the Hugging Face version of the model performs well on a 2-GPU setup.
junior-zsy
Metadata
Metadata
Assignees
Labels
No labels