-
-
Notifications
You must be signed in to change notification settings - Fork 10.5k
Description
🐛 Describe the bug
Since version 0.6.2 (happens also in 0.6.3.post1), after the server dies (due to an exception/crash or hitting ctrl-c), for about a minute, it fails to start again with:
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/user/code/debug/.venv/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 585, in <module>
uvloop.run(run_server(args))
File "/home/user/code/debug/.venv/lib/python3.10/site-packages/uvloop/__init__.py", line 82, in run
return loop.run_until_complete(wrapper())
File "uvloop/loop.pyx", line 1517, in uvloop.loop.Loop.run_until_complete
File "/home/user/code/debug/.venv/lib/python3.10/site-packages/uvloop/__init__.py", line 61, in wrapper
return await main
File "/home/user/code/debug/.venv/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 544, in run_server
sock.bind(("", args.port))
OSError: [Errno 98] Address already in use
This prolongs recovery from crashes. In example upon crash Kubernetes immediately restarts the container - previously it would immediately start loading the model again, but now it will do several crash/restart loops until the port is freed.
Verified it happens also with --disable-frontend-multiprocessing
.
To reproduce it, start vllm with default args, in example:
python -m vllm.entrypoints.openai.api_server --model TinyLlama/TinyLlama-1.1B-Chat-v1.0
and then send at least one chat or completion request to it (without this it won't reproduce).
then hit Ctrl-C to kill the server.
starting vllm again should throw the "Address already in use" error.
This doesn't happen with vllm <= 0.6.1.
I tried to see why the port is busy, and interestingly the vllm process is dead during this ~1 minute and no other process listens on it. However I noticed that there is a socket open from the 8000 port. Can see it via:
netstat | grep ':8000'
which would show something like:
tcp 0 0 localhost:8000 localhost:40452 TIME_WAIT -
tcp 0 0 localhost:8000 localhost:56324 TIME_WAIT -
tcp 0 0 localhost:8000 localhost:40466 TIME_WAIT -
After a minute these entries will disappear and then also vllm will manage to start.
I couldn't attribute it to a PID, nor with various nestat
or lsof
flags. Maybe it remains open in the kernel due to unclean process exit?