[Bug]: "Address already in use" for 1 minute after crash (since 0.6.2)

### 🐛 Describe the bug

Since version 0.6.2 (happens also in 0.6.3.post1), after the server dies (due to an exception/crash or hitting ctrl-c), for about a minute, it fails to start again with:
```
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/user/code/debug/.venv/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 585, in <module>
    uvloop.run(run_server(args))
  File "/home/user/code/debug/.venv/lib/python3.10/site-packages/uvloop/__init__.py", line 82, in run
    return loop.run_until_complete(wrapper())
  File "uvloop/loop.pyx", line 1517, in uvloop.loop.Loop.run_until_complete
  File "/home/user/code/debug/.venv/lib/python3.10/site-packages/uvloop/__init__.py", line 61, in wrapper
    return await main
  File "/home/user/code/debug/.venv/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 544, in run_server
    sock.bind(("", args.port))
OSError: [Errno 98] Address already in use
```
This prolongs recovery from crashes. In example upon crash Kubernetes immediately restarts the container - previously it would immediately start loading the model again, but now it will do several crash/restart loops until the port is freed.

Verified it happens also with `--disable-frontend-multiprocessing`.

To reproduce it, start vllm with default args, in example:
```
python -m vllm.entrypoints.openai.api_server --model TinyLlama/TinyLlama-1.1B-Chat-v1.0
```
and then send at least one chat or completion request to it (without this it won't reproduce).
then hit Ctrl-C to kill the server.
starting vllm again should throw the "Address already in use" error.
This doesn't happen with vllm <= 0.6.1.

I tried to see why the port is busy, and interestingly the vllm process is dead during this ~1 minute and no other process listens on it. However I noticed that there is a socket open *from* the 8000 port. Can see it via:
```
netstat | grep  ':8000'
```
which would show something like:
```
tcp        0      0 localhost:8000          localhost:40452         TIME_WAIT   -
tcp        0      0 localhost:8000          localhost:56324         TIME_WAIT   -
tcp        0      0 localhost:8000          localhost:40466         TIME_WAIT   -
```
After a minute these entries will disappear and then also vllm will manage to start.
I couldn't attribute it to a PID, nor with various `nestat` or `lsof` flags. Maybe it remains open in the kernel due to unclean process exit?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: "Address already in use" for 1 minute after crash (since 0.6.2) #9737

🐛 Describe the bug

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: "Address already in use" for 1 minute after crash (since 0.6.2) #9737

Description

🐛 Describe the bug

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions