Fix timeout error in the FastAPI frontend #34

zhuohan123 · 2023-04-09T07:37:31Z

No description provided.

zhuohan123 · 2023-05-19T19:31:27Z

@WoosukKwon Can you approve this small PR?

WoosukKwon

LGTM!

[CI/Build] use existing shadow-utils in Dockerfile.ubi

* Fix HPU auto-detection in setup.py * Update setup.py

… TP case (vllm-project#34) We are seeing Mixtral pods with TP>1 failing with errors like: ``` FileNotFoundError: [Errno 2] No such file or directory: '/home/vllm/.triton/cache/c926ad2ef143810ed738a313c473c7b2/fused_moe_kernel.cubin.tmp.pid_72_945989' ``` It seems like there is some conflict in the Triton cache directories when using multi-processing. This has actually been [fixed](triton-lang/triton#3544) upstream in Triton, but the fix hasn't made it into Triton v2.3.0 which is what vLLM is currently using. This change essentially applies same fix that has made it into Triton main branch inside our container. --------- Signed-off-by: Thomas Parnell <[email protected]>

@cyang49

…ch instead. (vllm-project#35) I tested the previous fix for the Triton cache collision issue (see: vllm-project#34) and it didn't work. I now see errors like: ``` FileNotFoundError: [Errno 2] No such file or directory: '/home/vllm/.triton/cache/1feb415f3280ca46eea8c4407a58c23e/fused_moe_kernel.json.tmp.pid_72_c0a0033e-6147-4520-ae3a-3847d02598f8' ``` which now shows the `uuid` instead of a random integer, but problem remains. This PR implements a different workaround, proposed by @cyang49, that tells Triton to use a custom cache manager which assigns a different directory based on the process id. This time I have tested it and it seems to work. --------- Signed-off-by: Thomas Parnell <[email protected]> Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Joe Runde <[email protected]> Co-authored-by: Chih-Chieh-Yang <[email protected]> Co-authored-by: Joe Runde <[email protected]> Co-authored-by: Nick Hill <[email protected]>

Enable GPTQ and autoround (INC) 4bits quantization and minor fix for moe awq tp

trigger CI

remove expert_max hard code (vllm-project#47) vLLM-Ext: Full enabling of ALiBi (vllm-project#34) Add version inference via setuptools-scm (vllm-project#58) Revert "vLLM-Ext: Full enabling of ALiBi (vllm-project#34)" (vllm-project#59) Remove punica_hpu.py from vllm_hpu_extension (vllm-project#66) Removed previous (not-pipelined) pa implementation (vllm-project#72) Add flag to enable running softmax in fp32 (vllm-project#71) Update calibration readme link (vllm-project#73) allow lm_head quantization in calibration process (vllm-project#65) Pad to bmin if value is less (vllm-project#67) Update pyproject.toml (HabanaAI#75) --------- Co-authored-by: Michał Kuligowski <[email protected]>

### What this PR does / why we need it? - Fix typos: vllm-ascned --> vllm-ascend - For version info ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? preview Signed-off-by: Yikun Jiang <[email protected]>

Signed-off-by: Yongye Zhu <[email protected]>

Signed-off-by: dengyunyang <[email protected]> Co-authored-by: dengyunyang <[email protected]>

…epgemm-integration Fix `paged_mqa_logits` clear_logits True

zhuohan123 added 3 commits April 9, 2023 15:31

Fix timeout error in the FastAPI frontend

c61ee87

modify pass to continue

017d780

Merge branch 'main' into fix-timeout-error

b971503

zhuohan123 requested a review from WoosukKwon May 19, 2023 19:31

WoosukKwon approved these changes May 19, 2023

View reviewed changes

zhuohan123 merged commit b7955ef into main May 19, 2023

zhuohan123 deleted the fix-timeout-error branch May 24, 2023 04:40

shanshanpt mentioned this pull request Nov 17, 2023

Run long conetxt error : CUDA error: an illegal memory access was encountered #1700

Closed

junior-zsy mentioned this pull request Nov 20, 2023

Error with 32k Long Text in chatglm2-6b-32k Model #1725

Closed

hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024

Fix timeout error in the FastAPI frontend (vllm-project#34)

f1abc66

dtrifiro pushed a commit to dtrifiro/vllm that referenced this pull request May 24, 2024

Merge pull request vllm-project#34 from RH-steve-grubb/useradd

b2968f1

[CI/Build] use existing shadow-utils in Dockerfile.ubi

tianyil1 pushed a commit to tianyil1/vllm that referenced this pull request Jun 5, 2024

Cleanup: Fix HPU auto-detection in setup.py (vllm-project#34)

14d294d

* Fix HPU auto-detection in setup.py * Update setup.py

ZHJ19970917 mentioned this pull request Jul 14, 2024

[Bug]: When using qwen-32b-chat-awq with multi-threaded access, errors occur after approximately several hundred visits.”vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop has errored already.“ #6421

Closed

bigPYJ1151 pushed a commit to bigPYJ1151/vllm that referenced this pull request Jul 31, 2024

Merge pull request vllm-project#34 from intel-sandbox/jianan/moe_awq_tp

d1e7088

Enable GPTQ and autoround (INC) 4bits quantization and minor fix for moe awq tp

alixiaodi mentioned this pull request Aug 2, 2024

[Bug]: #7072

Closed

dtrifiro pushed a commit to dtrifiro/vllm that referenced this pull request Aug 2, 2024

Merge pull request vllm-project#34 from dtrifiro/rhoai-2.12

2f2e071

trigger CI

zjjznw123 mentioned this pull request Aug 27, 2024

[Bug]: On a machine with an A100 GPU, when running the Dockerfile of version 0.5.5, an error occurs. #7914

Closed

1 task

hao-cold mentioned this pull request May 13, 2025

[Bug]: CUDA error: an illegal instruction was encountered #18045

Closed

1 task

markmc mentioned this pull request May 21, 2025

[Bug][Failing Test]: Distributed Comm Ops - distributed/test_shm_broadcast.py #18492

Closed

1 task

zerosurplus mentioned this pull request Jun 16, 2025

[Bug]: torch.distributed.DistNetworkError: The client socket has timed out after 600000ms while trying to connect to (172.17.0.9, 46229). #19670

Open

1 task

xiaomofang mentioned this pull request Jul 31, 2025

[Bug]: There is an issue with speculative inference in Eagle mode, where the context length of vLLM inference is constrained by the draft model. #21986

Open

1 task

zyongye added a commit to zyongye/vllm that referenced this pull request Aug 5, 2025

full export load can only used for mxfp4 (vllm-project#34)

a60c273

Signed-off-by: Yongye Zhu <[email protected]>

zyongye added a commit to zyongye/vllm that referenced this pull request Aug 6, 2025

full export load can only used for mxfp4 (vllm-project#34)

fda8bc6

Signed-off-by: Yongye Zhu <[email protected]>

Bounty-hunter added a commit to Bounty-hunter/vllm that referenced this pull request Sep 25, 2025

bugfix invalid prefix (vllm-project#34)

1bfa673

Signed-off-by: dengyunyang <[email protected]> Co-authored-by: dengyunyang <[email protected]>

heheda12345 pushed a commit to heheda12345/vllm that referenced this pull request Sep 29, 2025

Merge pull request vllm-project#34 from vllm-model-0920/wentao-fix-de…

70ec108

…epgemm-integration Fix `paged_mqa_logits` clear_logits True

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix timeout error in the FastAPI frontend #34

Fix timeout error in the FastAPI frontend #34

Uh oh!

zhuohan123 commented Apr 9, 2023

Uh oh!

zhuohan123 commented May 19, 2023

Uh oh!

WoosukKwon left a comment

Uh oh!

Uh oh!

Uh oh!

Fix timeout error in the FastAPI frontend #34

Fix timeout error in the FastAPI frontend #34

Uh oh!

Conversation

zhuohan123 commented Apr 9, 2023

Uh oh!

zhuohan123 commented May 19, 2023

Uh oh!

WoosukKwon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!