[ci][distributed] fix phi-3v test failure #5990

youkaichao · 2024-06-30T00:08:38Z

Looks like we initialize the hf config class too early, and that class is not pickle-able. we need to use fork method to start multi-processing.

DarkLight1337 · 2024-06-30T00:31:26Z

Hmm, but I thought this line meant that we must use spawn to test the models:

vllm/.buildkite/test-pipeline.yaml

Lines 48 to 50 in 54331fc

    
           # FIXIT: find out which code initialize cuda before running the test 
        
           # before the fix, we need to use spawn to test it 
        
           - export VLLM_WORKER_MULTIPROC_METHOD=spawn

Also, not sure why this didn't get caught in the pre-merge.

youkaichao · 2024-06-30T00:35:01Z

That line export VLLM_WORKER_MULTIPROC_METHOD=spawn is designed for pytest -v -s distributed/test_basic_distributed_correctness.py . It turns out pytest -v -s distributed/test_multimodal_broadcast.py has different requirements.

I don't know why this does not appear in pre-merge ci.

I hope I can figure out all of these stuff someday, but I don't have bandwidth recently :(

DarkLight1337 · 2024-06-30T03:41:45Z

FYI the AMD distributed tests are timing out now (perhaps most of the time is spent on model download, same issue as vision language models test), so we may have to split them out. Either that or the test just straight up hangs.

cc @mawong-amd

DarkLight1337 · 2024-06-30T07:57:08Z

Closing as superseded by #5991 which fixes the root cause.

youkaichao added 2 commits June 29, 2024 17:06

temp fix test bug

cd8d0f5

restore

54331fc

youkaichao requested a review from ywang96 June 30, 2024 00:09

adjust run order with hf and vllm

01d2165

DarkLight1337 mentioned this pull request Jun 30, 2024

[CI/Build] Temporarily Remove Phi3-Vision from TP Test #5989

Merged

DarkLight1337 closed this Jun 30, 2024

youkaichao deleted the fix_phi3_test branch June 30, 2024 08:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[ci][distributed] fix phi-3v test failure #5990

[ci][distributed] fix phi-3v test failure #5990

Uh oh!

youkaichao commented Jun 30, 2024

Uh oh!

DarkLight1337 commented Jun 30, 2024 •

edited

Loading

Uh oh!

youkaichao commented Jun 30, 2024

Uh oh!

DarkLight1337 commented Jun 30, 2024 •

edited

Loading

Uh oh!

DarkLight1337 commented Jun 30, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[ci][distributed] fix phi-3v test failure #5990

[ci][distributed] fix phi-3v test failure #5990

Uh oh!

Conversation

youkaichao commented Jun 30, 2024

Uh oh!

DarkLight1337 commented Jun 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

youkaichao commented Jun 30, 2024

Uh oh!

DarkLight1337 commented Jun 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DarkLight1337 commented Jun 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

DarkLight1337 commented Jun 30, 2024 •

edited

Loading

DarkLight1337 commented Jun 30, 2024 •

edited

Loading

DarkLight1337 commented Jun 30, 2024 •

edited

Loading