Skip to content

Conversation

youkaichao
Copy link
Member

Looks like we initialize the hf config class too early, and that class is not pickle-able. we need to use fork method to start multi-processing.

@youkaichao youkaichao requested a review from ywang96 June 30, 2024 00:09
@DarkLight1337
Copy link
Member

DarkLight1337 commented Jun 30, 2024

Hmm, but I thought this line meant that we must use spawn to test the models:

# FIXIT: find out which code initialize cuda before running the test
# before the fix, we need to use spawn to test it
- export VLLM_WORKER_MULTIPROC_METHOD=spawn

Also, not sure why this didn't get caught in the pre-merge.

@youkaichao
Copy link
Member Author

That line export VLLM_WORKER_MULTIPROC_METHOD=spawn is designed for pytest -v -s distributed/test_basic_distributed_correctness.py . It turns out pytest -v -s distributed/test_multimodal_broadcast.py has different requirements.

I don't know why this does not appear in pre-merge ci.

I hope I can figure out all of these stuff someday, but I don't have bandwidth recently :(

@DarkLight1337
Copy link
Member

DarkLight1337 commented Jun 30, 2024

FYI the AMD distributed tests are timing out now (perhaps most of the time is spent on model download, same issue as vision language models test), so we may have to split them out. Either that or the test just straight up hangs.

cc @mawong-amd

@DarkLight1337
Copy link
Member

DarkLight1337 commented Jun 30, 2024

Closing as superseded by #5991 which fixes the root cause.

@youkaichao youkaichao deleted the fix_phi3_test branch June 30, 2024 08:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants