Skip to content

Conversation

zhuohan123
Copy link
Member

No description provided.

@zhuohan123
Copy link
Member Author

@WoosukKwon Can you approve this small PR?

@zhuohan123 zhuohan123 requested a review from WoosukKwon May 19, 2023 19:31
Copy link
Collaborator

@WoosukKwon WoosukKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@zhuohan123 zhuohan123 merged commit b7955ef into main May 19, 2023
@zhuohan123 zhuohan123 deleted the fix-timeout-error branch May 24, 2023 04:40
hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024
dtrifiro pushed a commit to dtrifiro/vllm that referenced this pull request May 24, 2024
[CI/Build] use existing shadow-utils in Dockerfile.ubi
tianyil1 pushed a commit to tianyil1/vllm that referenced this pull request Jun 5, 2024
* Fix HPU auto-detection in setup.py

* Update setup.py
joerunde pushed a commit to joerunde/vllm that referenced this pull request Jun 17, 2024
… TP case (vllm-project#34)

We are seeing Mixtral pods with TP>1 failing with errors like:
```
FileNotFoundError: [Errno 2] No such file or directory: '/home/vllm/.triton/cache/c926ad2ef143810ed738a313c473c7b2/fused_moe_kernel.cubin.tmp.pid_72_945989'
```
It seems like there is some conflict in the Triton cache directories
when using multi-processing. This has actually been
[fixed](triton-lang/triton#3544) upstream in
Triton, but the fix hasn't made it into Triton v2.3.0 which is what vLLM
is currently using.

This change essentially applies same fix that has made it into Triton
main branch inside our container.

---------

Signed-off-by: Thomas Parnell <[email protected]>
joerunde added a commit to joerunde/vllm that referenced this pull request Jun 17, 2024
…ch instead. (vllm-project#35)

I tested the previous fix for the Triton cache collision issue (see:
vllm-project#34) and it didn't work.

I now see errors like:
```
FileNotFoundError: [Errno 2] No such file or directory: '/home/vllm/.triton/cache/1feb415f3280ca46eea8c4407a58c23e/fused_moe_kernel.json.tmp.pid_72_c0a0033e-6147-4520-ae3a-3847d02598f8'
```
which now shows the `uuid` instead of a random integer, but problem
remains.

This PR implements a different workaround, proposed by @cyang49, that
tells Triton to use a custom cache manager which assigns a different
directory based on the process id.

This time I have tested it and it seems to work.

---------

Signed-off-by: Thomas Parnell <[email protected]>
Signed-off-by: Nick Hill <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
Co-authored-by: Chih-Chieh-Yang <[email protected]>
Co-authored-by: Joe Runde <[email protected]>
Co-authored-by: Nick Hill <[email protected]>
bigPYJ1151 pushed a commit to bigPYJ1151/vllm that referenced this pull request Jul 31, 2024
Enable GPTQ and autoround (INC) 4bits quantization and minor fix for moe awq tp
@alixiaodi alixiaodi mentioned this pull request Aug 2, 2024
dtrifiro pushed a commit to dtrifiro/vllm that referenced this pull request Aug 2, 2024
pi314ever pushed a commit to pi314ever/vllm that referenced this pull request Jan 17, 2025
remove expert_max hard code (vllm-project#47)
vLLM-Ext: Full enabling of ALiBi (vllm-project#34)
Add version inference via setuptools-scm (vllm-project#58)
Revert "vLLM-Ext: Full enabling of ALiBi (vllm-project#34)" (vllm-project#59)
Remove punica_hpu.py from vllm_hpu_extension (vllm-project#66)
Removed previous (not-pipelined) pa implementation (vllm-project#72)
Add flag to enable running softmax in fp32 (vllm-project#71)
Update calibration readme link (vllm-project#73)
allow lm_head quantization in calibration process (vllm-project#65)
Pad to bmin if value is less (vllm-project#67)
Update pyproject.toml (HabanaAI#75)

---------

Co-authored-by: Michał Kuligowski <[email protected]>
wuhuikx pushed a commit to wuhuikx/vllm that referenced this pull request Mar 27, 2025
### What this PR does / why we need it?
- Fix typos: vllm-ascned --> vllm-ascend
- For version info

### Does this PR introduce _any_ user-facing change?
No


### How was this patch tested?
preview

Signed-off-by: Yikun Jiang <[email protected]>
zyongye added a commit to zyongye/vllm that referenced this pull request Aug 5, 2025
zyongye added a commit to zyongye/vllm that referenced this pull request Aug 6, 2025
Bounty-hunter added a commit to Bounty-hunter/vllm that referenced this pull request Sep 25, 2025
Signed-off-by: dengyunyang <[email protected]>
Co-authored-by: dengyunyang <[email protected]>
heheda12345 pushed a commit to heheda12345/vllm that referenced this pull request Sep 29, 2025
…epgemm-integration

Fix `paged_mqa_logits` clear_logits True
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants