[CI/Build] Split pooling and generation extended language models tests in CI #18705

Isotr0py · 2025-05-26T09:20:37Z

Split generation and pooling tests in Language Models Test (Extended), so that we don't need to run unnecessary extended generation tests, which costs over 1hr for embedding PR.

Signed-off-by: Isotr0py <[email protected]>

.buildkite/test-pipeline.yaml

Signed-off-by: Isotr0py <[email protected]>

* Add files via uploadAdd fused MoE kernel tuning configs (fp8_w8a8) for DeepSeek V3/R1 on a single-node 8x NVIDIA H20 96GB setup (vllm-project#18337) * [Misc] Fix typo (vllm-project#18330) * Neuron up mistral (vllm-project#18222) Signed-off-by: Satyajith Chilappagari <[email protected]> * fix CUDA_check redefinition in vllm-project#17918 (vllm-project#18287) Signed-off-by: Lucia Fang <[email protected]> Co-authored-by: Lucia (Lu) Fang <[email protected]> * [neuron] fix authorization issue (vllm-project#18364) Signed-off-by: Liangfu Chen <[email protected]> * [Misc] Allow `AutoWeightsLoader` to skip loading weights with specific substr in name (vllm-project#18358) Signed-off-by: Isotr0py <[email protected]> * [Core] [Bugfix]: tensor parallel with prompt embeds (vllm-project#18171) Signed-off-by: Nan2018 <[email protected]> Co-authored-by: Andrew Sansom <[email protected]> * [release] Change dockerhub username for TPU release (vllm-project#18389) * [Bugfix] fix adding bias twice in ipex GPTQ quantization (vllm-project#18363) Signed-off-by: rand-fly <[email protected]> * [doc] update env variable export (vllm-project#18391) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * [Misc] Add LoRA code owner (vllm-project#18387) Signed-off-by: Jee Jee Li <[email protected]> * Update cpu.txt (vllm-project#18398) Signed-off-by: 汪志鹏 <[email protected]> * [CI] Add mteb testing to test the accuracy of the embedding model (vllm-project#17175) * [Bugfix] Fix MRoPE Errors in the Qwen-VL Model When Processing Pure Text (vllm-project#18407) Co-authored-by: 松灵 <[email protected]> * [Misc] refactor prompt embedding examples (vllm-project#18405) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * [Minor] Rename quantization nvfp4 to modelopt_fp4 (vllm-project#18356) Signed-off-by: mgoin <[email protected]> * [Model] use AutoWeightsLoader for bloom (vllm-project#18300) Signed-off-by: calvin chen <[email protected]> * [Kernel] update comment for KV shape in unified triton attn (vllm-project#18099) Signed-off-by: haochengxia <[email protected]> * fix:Build torch wheel inline rather than picking from nightly (vllm-project#18351) Signed-off-by: Dilip Gowda Bhagavan <[email protected]> * [TPU] Re-enable the Pallas MoE kernel (vllm-project#18025) Signed-off-by: Michael Goin <[email protected]> * [Bugfix] config.head_dim is now explicitly set to None (vllm-project#18432) Signed-off-by: Gregory Shtrasberg <[email protected]> * [Bug] Fix moe_sum signature (vllm-project#18440) Signed-off-by: Bill Nell <[email protected]> * Revert "[Bugfix] Fix MRoPE Errors in the Qwen-VL Model When Processing Pure Text (vllm-project#18407)" (vllm-project#18456) Signed-off-by: DarkLight1337 <[email protected]> * [Bugfix][Failing Test] Fix nixl connector test when promt size < block size (vllm-project#18429) Signed-off-by: wwl2755 <[email protected]> * [Misc] MultiConnector._connectors type (vllm-project#18423) Signed-off-by: nicklucche <[email protected]> * [Frontend] deprecate `--device` arg (vllm-project#18399) Signed-off-by: Kebe <[email protected]> * [V1] Fix general plugins not loaded in engine for multiproc (vllm-project#18326) Signed-off-by: Yong Hoon Shin <[email protected]> * [Misc] refactor disaggregated-prefill-v1 example (vllm-project#18474) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * [Bugfix][Failing Test] Fix test_events.py (vllm-project#18460) Signed-off-by: rabi <[email protected]> * [MODEL] FalconH1 (vllm-project#18406) Signed-off-by: dhia.rhaiem <[email protected]> Co-authored-by: younesbelkada <[email protected]> Co-authored-by: Ilyas Chahed <[email protected]> Co-authored-by: Jingwei Zuo <[email protected]> * [Doc] fix arg docstring in linear layers (vllm-project#18410) Signed-off-by: giantcroc <[email protected]> * [Bugfix] Reduce moe_sum test size to avoid OOM (vllm-project#18484) Signed-off-by: Bill Nell <[email protected]> * [Build] fix Dockerfile shell (vllm-project#18402) * [Misc] Update deprecation message for `--enable-reasoning` (vllm-project#18404) * [ROCm][Kernel][V1] Enable AMD Radeon GPU Custom Paged Attention on v1 (vllm-project#17004) Signed-off-by: Hosang Yoon <[email protected]> * Remove incorrect env value * Revert "[v1] Support multiple KV cache groups in GPU model runner (vllm-project#17945) (vllm-project#18459) Signed-off-by: Mark McLoughlin <[email protected]> * [FEAT][ROCm] Upgrade AITER MLA v1 backend (vllm-project#18338) Signed-off-by: vllmellm <[email protected]> Co-authored-by: Luka Govedič <[email protected]> * [Bugfix] Consistent ascii handling in tool parsers (vllm-project#17704) Signed-off-by: Sebastian Schönnenbeck <[email protected]> * [FalconH1] Fix output dtype in RMSNorm fallback path for Falcon-H1 (e.g. 0.5B) (vllm-project#18500) Signed-off-by: dhia.rhaiem <[email protected]> Co-authored-by: younesbelkada <[email protected]> Co-authored-by: Ilyas Chahed <[email protected]> Co-authored-by: Jingwei Zuo <[email protected]> * [MISC] update project urls in pyproject.toml (vllm-project#18519) Signed-off-by: Andy Xie <[email protected]> * [CI] Fix race condition with StatelessProcessGroup.barrier (vllm-project#18506) Signed-off-by: Russell Bryant <[email protected]> * Intialize io_thread_pool attribute in the beginning. (vllm-project#18331) Signed-off-by: rabi <[email protected]> * [Bugfix] Inconsistent token calculation compared to HF in llava family (vllm-project#18479) Signed-off-by: jaycha <[email protected]> * [BugFix][DP] Send DP wave completion only from `dp_rank==0` (vllm-project#18502) Signed-off-by: Nick Hill <[email protected]> Co-authored-by: kourosh hakhamaneshi <[email protected]> * [Bugfix][Model] Make Olmo2Model weight loading return loaded weights (vllm-project#18504) Signed-off-by: Shane A <[email protected]> * [Bugfix] Fix LoRA test (vllm-project#18518) Signed-off-by: Jee Jee Li <[email protected]> * [Doc] Fix invalid JSON in example args (vllm-project#18527) Signed-off-by: DarkLight1337 <[email protected]> * [Neuron] Update Dockerfile.neuron to use latest neuron release (2.23) (vllm-project#18512) Signed-off-by: Satyajith Chilappagari <[email protected]> * Update default neuron config for speculation (vllm-project#18274) Signed-off-by: Elaine Zhao <[email protected]> Co-authored-by: Shashwat Srijan <[email protected]> Co-authored-by: Aakash Shetty <[email protected]> * Order sequence ids + config update to support specifying custom quantization layers (vllm-project#18279) Signed-off-by: Elaine Zhao <[email protected]> Co-authored-by: Tailin Pan <[email protected]> Co-authored-by: Rishabh Rajesh <[email protected]> Co-authored-by: Yishan McNabb <[email protected]> Co-authored-by: Patrick Lange <[email protected]> Co-authored-by: Maxwell Goldberg <[email protected]> Co-authored-by: Aakash Shetty <[email protected]> * [Bugfix] Fix MRoPE Errors in the Qwen-VL Model When Processing Pure Text (vllm-project#18526) Co-authored-by: 松灵 <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: DarkLight1337 <[email protected]> * [Bugfix] Add kwargs to RequestOutput __init__ to be forward compatible (vllm-project#18513) Signed-off-by: Linkun <[email protected]> * [CI/Build] Update bamba test model location (vllm-project#18544) Signed-off-by: Harry Mellor <[email protected]> * [Doc] Support --stream arg in openai_completion_client.py script (vllm-project#18388) Signed-off-by: googs1025 <[email protected]> * [Bugfix] Use random hidden states in dummy sampler run (vllm-project#18543) Signed-off-by: Bowen Wang <[email protected]> * [Doc] Add stream flag for chat completion example (vllm-project#18524) Signed-off-by: calvin chen <[email protected]> * [BugFix][CPU] Fix x86 SHM distributed module initialization (vllm-project#18536) Signed-off-by: jiang.li <[email protected]> * [Misc] improve Automatic Prefix Caching example (vllm-project#18554) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * [Misc] Call `ndarray.tobytes()` directly instead of `ndarray.data.tobytes()` (vllm-project#18347) Signed-off-by: Lukas Geiger <[email protected]> * [Bugfix] make `test_openai_schema.py` pass (vllm-project#18224) Signed-off-by: David Xia <[email protected]> Co-authored-by: Harry Mellor <[email protected]> * [Platform] Move platform check to right place (vllm-project#18470) Signed-off-by: wangxiyuan <[email protected]> * [Compile][Platform] Make PiecewiseBackend pluggable and extendable (vllm-project#18076) Signed-off-by: Mengqing Cao <[email protected]> Co-authored-by: youkaichao <[email protected]> * [Build/CI] Fix CUDA 11.8 build (vllm-project#17679) Signed-off-by: Tyler Michael Smith <[email protected]> Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: Tyler Michael Smith <[email protected]> Co-authored-by: Lucas Wilkinson <[email protected]> * [Tool] Add NIXL installation script (vllm-project#18172) Signed-off-by: Linkun <[email protected]> * [V1][Spec Decode][Bugfix] Load quantize weights for EAGLE (vllm-project#18290) * [Frontend][Bug Fix] Update llama4 pythonic jinja template and llama4_pythonic parser (vllm-project#17917) Signed-off-by: Kai Wu <[email protected]> * [Frontend] [Core] Add Tensorizer support for V1, LoRA adapter serialization and deserialization (vllm-project#17926) Signed-off-by: Sanger Steel <[email protected]> * [AMD] [P/D] Compute num gpus for ROCm correctly in run_accuracy_test.sh (vllm-project#18568) Signed-off-by: Randall Smith <[email protected]> * Re-submit: Fix: Proper RGBA -> RGB conversion for PIL images. (vllm-project#18569) Signed-off-by: Chenheli Hua <[email protected]> * [V1][Spec Decoding] Use model_loader.get_model() to load models (vllm-project#18273) Signed-off-by: Mark McLoughlin <[email protected]> * Enable hybrid attention models for Transformers backend (vllm-project#18494) Signed-off-by: Harry Mellor <[email protected]> * [Misc] refactor: simplify input validation and num_requests handling in _convert_v1_inputs (vllm-project#18482) Signed-off-by: googs1025 <[email protected]> * [BugFix] Increase TP execute_model timeout (vllm-project#18558) Signed-off-by: Nick Hill <[email protected]> * [Bugfix] Set `KVTransferConfig.engine_id` in post_init (vllm-project#18576) Signed-off-by: Linkun Chen <[email protected]> * [Spec Decode] Make EAGLE3 draft token ID mapping optional (vllm-project#18488) Signed-off-by: Benjamin Chislett <[email protected]> Co-authored-by: Woosuk Kwon <[email protected]> * [Neuron] Remove bypass on EAGLEConfig and add a test (vllm-project#18514) Signed-off-by: Elaine Zhao <[email protected]> * [Bugfix][Benchmarks] Fix a benchmark of deepspeed-mii backend to use api_key (vllm-project#17291) Signed-off-by: Teruaki Ishizaki <[email protected]> * [Misc] Replace `cuda` hard code with `current_platform` (vllm-project#16983) Signed-off-by: shen-shanshan <[email protected]> * [Hardware] correct method signatures for HPU,ROCm,XPU (vllm-project#18551) Signed-off-by: Andy Xie <[email protected]> * [V1] [Bugfix] eagle bugfix and enable correct lm_head for multimodal (vllm-project#18034) Signed-off-by: Ronald Xu <[email protected]> * [Feature]Add async tensor parallelism using compilation pass (vllm-project#17882) Signed-off-by: cascade812 <[email protected]> * [Doc] Update quickstart and install for cu128 using `--torch-backend=auto` (vllm-project#18505) Signed-off-by: mgoin <[email protected]> * [Feature][V1]: suupports cached_tokens in response usage (vllm-project#18149) Co-authored-by: simon-mo <[email protected]> * [Bugfix] Add half type support in reshape_and_cache_cpu_impl on x86 cpu platform (vllm-project#18430) Signed-off-by: Yuqi Zhang <[email protected]> Co-authored-by: Yuqi Zhang <[email protected]> * Migrate docs from Sphinx to MkDocs (vllm-project#18145) Signed-off-by: Harry Mellor <[email protected]> * Revert "[V1] [Bugfix] eagle bugfix and enable correct lm_head for multimodal (vllm-project#18034)" (vllm-project#18600) Signed-off-by: DarkLight1337 <[email protected]> * [Bugfix][Model] Fix baichuan model loader for tp (vllm-project#18597) Signed-off-by: Mengqing Cao <[email protected]> * [V0][Bugfix] Fix parallel sampling performance regression when guided decoding is enabled (vllm-project#17731) Signed-off-by: Madeesh Kannan <[email protected]> Co-authored-by: Russell Bryant <[email protected]> * Add myself as docs code owner (vllm-project#18605) Signed-off-by: Harry Mellor <[email protected]> * [Hardware][CPU] Update intel_extension_for_pytorch 2.7.0 and move to `requirements/cpu.txt` (vllm-project#18542) Signed-off-by: Kay Yan <[email protected]> * [CI] fix kv_cache_type argument (vllm-project#18594) Signed-off-by: Andy Xie <[email protected]> * [Doc] Fix indent of contributing to vllm (vllm-project#18611) Signed-off-by: Zerohertz <[email protected]> * Replace `{func}` with mkdocs style links (vllm-project#18610) Signed-off-by: Harry Mellor <[email protected]> * [CI/Build] Fix V1 flag being set in entrypoints tests (vllm-project#18598) Signed-off-by: DarkLight1337 <[email protected]> * Fix examples with code blocks in docs (vllm-project#18609) Signed-off-by: Harry Mellor <[email protected]> * [Bugfix] Fix transformers model impl ignored for mixtral quant (vllm-project#18602) Signed-off-by: Tristan Leclercq <[email protected]> * Include private attributes in API documentation (vllm-project#18614) Signed-off-by: Harry Mellor <[email protected]> * [Misc] add Haystack integration (vllm-project#18601) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * [Bugfix][Build/CI] Fixup CUDA compiler version check for CUDA_SUPPORTED_ARCHS (vllm-project#18579) * [Doc] Fix markdown list indentation for MkDocs rendering (vllm-project#18620) Signed-off-by: Zerohertz <[email protected]> * [Doc] Use a different color for the announcement (vllm-project#18616) Signed-off-by: DarkLight1337 <[email protected]> * Refactor pplx init logic to make it modular (prepare for deepep) (vllm-project#18200) Signed-off-by: youkaichao <[email protected]> * Fix figures in design doc (vllm-project#18612) Signed-off-by: Harry Mellor <[email protected]> * [Docs] Change mkdocs to not use directory urls (vllm-project#18622) Signed-off-by: mgoin <[email protected]> * [v1] Redo "Support multiple KV cache groups in GPU model runner (vllm-project#17945)" (vllm-project#18593) Signed-off-by: Chen Zhang <[email protected]> * [Doc] fix list formatting (vllm-project#18624) Signed-off-by: David Xia <[email protected]> * [Doc] Fix top-level API links/docs (vllm-project#18621) Signed-off-by: DarkLight1337 <[email protected]> * [Doc] Avoid documenting dynamic / internal modules (vllm-project#18626) Signed-off-by: DarkLight1337 <[email protected]> * [Doc] Fix broken links and unlinked docs, add shortcuts to home sidebar (vllm-project#18627) Signed-off-by: DarkLight1337 <[email protected]> * [V1] Support Deepseek MTP (vllm-project#18435) Signed-off-by: Rui Qiao <[email protected]> Signed-off-by: YaoJiayi <[email protected]> Co-authored-by: Rui Qiao <[email protected]> * Use prebuilt FlashInfer x86_64 PyTorch 2.7 CUDA 12.8 wheel for CI (vllm-project#18537) Signed-off-by: Huy Do <[email protected]> * [CI] Enable test_initialization to run on V1 (vllm-project#16736) Signed-off-by: mgoin <[email protected]> * [Doc] Update references to doc files (vllm-project#18637) Signed-off-by: DarkLight1337 <[email protected]> * [ModelOpt] Introduce VLLM_MAX_TOKENS_PER_EXPERT_FP4_MOE env var to control blockscale tensor allocation (vllm-project#18160) Signed-off-by: Pavani Majety <[email protected]> * [Bugfix] Migrate to REGEX Library to prevent catastrophic backtracking (vllm-project#18454) Signed-off-by: Crucifixion-Fxl <[email protected]> Co-authored-by: Crucifixion-Fxl <[email protected]> * [Bugfix][Nixl] Fix Preemption Bug (vllm-project#18631) Signed-off-by: [email protected] <[email protected]> * config.py: Clarify that only local GGUF checkpoints are supported. (vllm-project#18623) Signed-off-by: Mathieu Bordere <[email protected]> * FIX MOE issue in AutoRound format (vllm-project#18586) Signed-off-by: wenhuach21 <[email protected]> * [V1][Spec Decode] Small refactors to improve eagle bookkeeping performance (vllm-project#18424) Signed-off-by: qizixi <[email protected]> * [Frontend] improve vllm serve --help display (vllm-project#18643) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * [Model] Add support for Qwen2.5-Omni-7B-AWQ (Qwen2_5OmniForConditionalGeneration) (vllm-project#18647) * [V1][Spec Decode] Support multi-layer eagle draft model (vllm-project#18030) Signed-off-by: qizixi <[email protected]> * [Doc] Update README links, mark external links (vllm-project#18635) Signed-off-by: DarkLight1337 <[email protected]> * [MISC][pre-commit] Add pre-commit check for triton import (vllm-project#17716) Signed-off-by: Mengqing Cao <[email protected]> * [Doc] Fix indentation problems in V0 Paged Attention docs (vllm-project#18659) Signed-off-by: DarkLight1337 <[email protected]> * [Doc] Add community links (vllm-project#18657) Signed-off-by: DarkLight1337 <[email protected]> * [Model] use AutoWeightsLoader for gpt2 (vllm-project#18625) Signed-off-by: zt2370 <[email protected]> * [Doc] Reorganize user guide (vllm-project#18661) Signed-off-by: DarkLight1337 <[email protected]> * [CI/Build] `chmod +x` to `cleanup_pr_body.sh` (vllm-project#18650) Signed-off-by: DarkLight1337 <[email protected]> * [MISC] typo fix and clean import (vllm-project#18664) Signed-off-by: Andy Xie <[email protected]> * [BugFix] Fix import error for fused_moe (vllm-project#18642) Signed-off-by: wangxiyuan <[email protected]> * [CI] enforce import regex instead of re (vllm-project#18665) Signed-off-by: Aaron Pham <[email protected]> * fix(regression): clone from reference items (vllm-project#18662) Signed-off-by: Aaron Pham <[email protected]> * [CI/Build] fix permission denied issue (vllm-project#18645) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * [BugFix][Spec Decode] Improve Prefix Caching Logic in Speculative Decoding (vllm-project#18668) Signed-off-by: Woosuk Kwon <[email protected]> * [V1] Fix _pickle.PicklingError: Can't pickle <class 'transformers_modules.deepseek-ai.DeepSeek-V2-Lite... (vllm-project#18640) Signed-off-by: Seiji Eicher <[email protected]> * [MISC] correct signature for LoaderFunction (vllm-project#18670) Signed-off-by: Andy Xie <[email protected]> * [Misc]Replace `cuda` hard code with `current_platform` in Ray (vllm-project#14668) Signed-off-by: noemotiovon <[email protected]> * [Misc][ModelScope] Change to use runtime VLLM_USE_MODELSCOPE (vllm-project#18655) Signed-off-by: Mengqing Cao <[email protected]> Signed-off-by: Isotr0py <[email protected]> Co-authored-by: Isotr0py <[email protected]> * [VLM] Initialize video input support for InternVL models (vllm-project#18499) Signed-off-by: Isotr0py <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> * Speed up the `kernels/quantization/` tests (vllm-project#18669) Signed-off-by: mgoin <[email protected]> * [BUGFIX] catch subclass first for try...except (vllm-project#18672) Signed-off-by: Andy Xie <[email protected]> * [Misc] Reduce logs on startup (vllm-project#18649) Signed-off-by: DarkLight1337 <[email protected]> * [doc] fix broken links (vllm-project#18671) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * [doc] improve readability (vllm-project#18675) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * [Bugfix] Fix cpu usage and cache hit stats reporting on cpu environment (vllm-project#18674) Signed-off-by: zzzyq <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> * [CI/build] fix no regex (vllm-project#18676) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * [Misc] small improve (vllm-project#18680) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * [Bugfix] Fix profiling dummy data for Pixtral (vllm-project#18677) Signed-off-by: DarkLight1337 <[email protected]> * [Core][Multimodal] Convert PIL Image to array without data copy when hashing (vllm-project#18682) Signed-off-by: Lukas Geiger <[email protected]> * [CI/Build][Doc] Update `gte-Qwen2-1.5B-instruct` usage (vllm-project#18683) Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: Isotr0py <[email protected]> Co-authored-by: Isotr0py <[email protected]> * [Misc] Fixed the abnormally high TTFT issue in the PD disaggregation example (vllm-project#18644) Signed-off-by: zhaohaidao <[email protected]> Signed-off-by: zhaohaiyuan <[email protected]> Co-authored-by: zhaohaiyuan <[email protected]> * refactor: simplify request handler, use positive condition check for handler assignment (vllm-project#18690) Signed-off-by: googs1025 <[email protected]> * [Bugfix] Fix the lm_head in gpt_bigcode in lora mode (vllm-project#6357) Signed-off-by: Max de Bayser <[email protected]> Signed-off-by: Max de Bayser <[email protected]> * [CI] add missing argument (vllm-project#18694) Signed-off-by: Andy Xie <[email protected]> * [GH] Add issue template for reporting CI failures (vllm-project#18696) Signed-off-by: DarkLight1337 <[email protected]> * [Doc] Fix issue template format (vllm-project#18699) Signed-off-by: DarkLight1337 <[email protected]> * [Bugfix] Fix Mistral-format models with sliding window (vllm-project#18693) Signed-off-by: DarkLight1337 <[email protected]> * [CI/Build] Replace `math.isclose` with `pytest.approx` (vllm-project#18703) Signed-off-by: DarkLight1337 <[email protected]> * [CI] fix dump_input for str type (vllm-project#18697) Signed-off-by: Andy Xie <[email protected]> * [Model] Add support for YARN in NemotronNAS models (vllm-project#18427) Signed-off-by: Nave Assaf <[email protected]> * [CI/Build] Split pooling and generation extended language models tests in CI (vllm-project#18705) Signed-off-by: Isotr0py <[email protected]> * [Hardware][Intel-Gaudi] [CI/Build] Add tensor parallel size = 2 test to HPU CI (vllm-project#18709) Signed-off-by: Lukasz Durejko <[email protected]> * [Misc] add AutoGen integration (vllm-project#18712) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> * [Bugfix]: handle hf-xet CAS error when loading Qwen3 weights in vLLM (vllm-project#18701) * [Doc] Improve API docs (vllm-project#18713) Signed-off-by: DarkLight1337 <[email protected]> * [Doc] Move examples and further reorganize user guide (vllm-project#18666) Signed-off-by: DarkLight1337 <[email protected]> * [Bugfix] Fix Llama GGUF initialization (vllm-project#18717) Signed-off-by: DarkLight1337 <[email protected]> * [V1][Sampler] Improve performance of FlashInfer sampling by sampling logits instead of probs (vllm-project#18608) * Convert `examples` to `ruff-format` (vllm-project#18400) Signed-off-by: Harry Mellor <[email protected]> * [Model][Gemma3] Simplify image input validation (vllm-project#18710) Signed-off-by: Lukas Geiger <[email protected]> * [Misc] improve web section group title display (vllm-project#18684) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * [V1][Quantization] Add CUDA graph compatible v1 GGUF support (vllm-project#18646) Signed-off-by: Isotr0py <[email protected]> Signed-off-by: Isotr0py <[email protected]> * [Model][Gemma3] Cast image pixel values already on CPU (vllm-project#18732) Signed-off-by: Lukas Geiger <[email protected]> * [FEAT] [ROCm] Upgrade AITER Fused MoE kernels. (vllm-project#18271) Signed-off-by: vllmellm <[email protected]> * [Doc] Update OOT model docs (vllm-project#18742) Signed-off-by: DarkLight1337 <[email protected]> * [Doc] Update reproducibility doc and example (vllm-project#18741) Signed-off-by: DarkLight1337 <[email protected]> * [Misc] improve docs (vllm-project#18734) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * feat(rocm-support): support mamba2 on rocm (vllm-project#18565) Signed-off-by: Islam Almersawi <[email protected]> Co-authored-by: Islam Almersawi <[email protected]> * [Hardware][Intel-Gaudi] [CI/Build] Fix multiple containers using the same name in run-hpu-test.sh (vllm-project#18752) Signed-off-by: Lukasz Durejko <[email protected]> * [Doc] cleanup deprecated flag for doc (vllm-project#18715) Signed-off-by: calvin chen <[email protected]> * Minor fix about MooncakeStoreConnector (vllm-project#18721) Signed-off-by: baoloongmao <[email protected]> * [Build] fix cpu build missing libtbbmalloc.so (vllm-project#18744) Signed-off-by: Kebe <[email protected]> * [BUG FIX] minicpm (vllm-project#18739) Signed-off-by: huangyuxiang03 <[email protected]> Co-authored-by: huangyuxiang03 <[email protected]> * [Doc] Convert Sphinx directives ( `{class}`, `{meth}`, `{attr}`, ...) to MkDocs format for better documentation linking (vllm-project#18663) Signed-off-by: Zerohertz <[email protected]> * [CI/Build] Remove imports of built-in `re` (vllm-project#18750) Signed-off-by: DarkLight1337 <[email protected]> * [V1][Metrics] Add API for accessing in-memory Prometheus metrics (vllm-project#17010) Signed-off-by: Mark McLoughlin <[email protected]> * Disable prefix cache by default for benchmark (vllm-project#18639) Signed-off-by: cascade812 <[email protected]> * optimize get_kv_cache_torch_dtype (vllm-project#18531) Signed-off-by: idellzheng <[email protected]> * [Core] Automatically cast multi-modal input dtype (vllm-project#18756) Signed-off-by: DarkLight1337 <[email protected]> * [Bugfix] Mistral tool calling when content is list (vllm-project#18729) Signed-off-by: mgoin <[email protected]> --------- Signed-off-by: Satyajith Chilappagari <[email protected]> Signed-off-by: Lucia Fang <[email protected]> Signed-off-by: Liangfu Chen <[email protected]> Signed-off-by: Isotr0py <[email protected]> Signed-off-by: Nan2018 <[email protected]> Signed-off-by: rand-fly <[email protected]> Signed-off-by: reidliu41 <[email protected]> Signed-off-by: Jee Jee Li <[email protected]> Signed-off-by: 汪志鹏 <[email protected]> Signed-off-by: mgoin <[email protected]> Signed-off-by: calvin chen <[email protected]> Signed-off-by: haochengxia <[email protected]> Signed-off-by: Dilip Gowda Bhagavan <[email protected]> Signed-off-by: Michael Goin <[email protected]> Signed-off-by: Gregory Shtrasberg <[email protected]> Signed-off-by: Bill Nell <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: wwl2755 <[email protected]> Signed-off-by: nicklucche <[email protected]> Signed-off-by: Kebe <[email protected]> Signed-off-by: Yong Hoon Shin <[email protected]> Signed-off-by: rabi <[email protected]> Signed-off-by: dhia.rhaiem <[email protected]> Signed-off-by: giantcroc <[email protected]> Signed-off-by: Hosang Yoon <[email protected]> Signed-off-by: Mark McLoughlin <[email protected]> Signed-off-by: vllmellm <[email protected]> Signed-off-by: Sebastian Schönnenbeck <[email protected]> Signed-off-by: Andy Xie <[email protected]> Signed-off-by: Russell Bryant <[email protected]> Signed-off-by: jaycha <[email protected]> Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Shane A <[email protected]> Signed-off-by: Elaine Zhao <[email protected]> Signed-off-by: Linkun <[email protected]> Signed-off-by: Harry Mellor <[email protected]> Signed-off-by: googs1025 <[email protected]> Signed-off-by: Bowen Wang <[email protected]> Signed-off-by: jiang.li <[email protected]> Signed-off-by: Lukas Geiger <[email protected]> Signed-off-by: David Xia <[email protected]> Signed-off-by: wangxiyuan <[email protected]> Signed-off-by: Mengqing Cao <[email protected]> Signed-off-by: Tyler Michael Smith <[email protected]> Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: Tyler Michael Smith <[email protected]> Signed-off-by: Kai Wu <[email protected]> Signed-off-by: Sanger Steel <[email protected]> Signed-off-by: Randall Smith <[email protected]> Signed-off-by: Chenheli Hua <[email protected]> Signed-off-by: Linkun Chen <[email protected]> Signed-off-by: Benjamin Chislett <[email protected]> Signed-off-by: Teruaki Ishizaki <[email protected]> Signed-off-by: shen-shanshan <[email protected]> Signed-off-by: Ronald Xu <[email protected]> Signed-off-by: cascade812 <[email protected]> Signed-off-by: Yuqi Zhang <[email protected]> Signed-off-by: Madeesh Kannan <[email protected]> Signed-off-by: Kay Yan <[email protected]> Signed-off-by: Zerohertz <[email protected]> Signed-off-by: Tristan Leclercq <[email protected]> Signed-off-by: youkaichao <[email protected]> Signed-off-by: Chen Zhang <[email protected]> Signed-off-by: Rui Qiao <[email protected]> Signed-off-by: YaoJiayi <[email protected]> Signed-off-by: Huy Do <[email protected]> Signed-off-by: Pavani Majety <[email protected]> Signed-off-by: Crucifixion-Fxl <[email protected]> Signed-off-by: [email protected] <[email protected]> Signed-off-by: Mathieu Bordere <[email protected]> Signed-off-by: wenhuach21 <[email protected]> Signed-off-by: qizixi <[email protected]> Signed-off-by: zt2370 <[email protected]> Signed-off-by: Aaron Pham <[email protected]> Signed-off-by: Woosuk Kwon <[email protected]> Signed-off-by: Seiji Eicher <[email protected]> Signed-off-by: noemotiovon <[email protected]> Signed-off-by: zzzyq <[email protected]> Signed-off-by: zhaohaidao <[email protected]> Signed-off-by: zhaohaiyuan <[email protected]> Signed-off-by: Max de Bayser <[email protected]> Signed-off-by: Max de Bayser <[email protected]> Signed-off-by: Nave Assaf <[email protected]> Signed-off-by: Lukasz Durejko <[email protected]> Signed-off-by: Isotr0py <[email protected]> Signed-off-by: Islam Almersawi <[email protected]> Signed-off-by: baoloongmao <[email protected]> Signed-off-by: huangyuxiang03 <[email protected]> Signed-off-by: idellzheng <[email protected]> Co-authored-by: sunyicode0012 <[email protected]> Co-authored-by: Gong Shufan <[email protected]> Co-authored-by: Satyajith Chilappagari <[email protected]> Co-authored-by: Lucia Fang <[email protected]> Co-authored-by: Lucia (Lu) Fang <[email protected]> Co-authored-by: Liangfu Chen <[email protected]> Co-authored-by: Isotr0py <[email protected]> Co-authored-by: Nan Qin <[email protected]> Co-authored-by: Andrew Sansom <[email protected]> Co-authored-by: Kevin H. Luu <[email protected]> Co-authored-by: Random Fly <[email protected]> Co-authored-by: Reid <[email protected]> Co-authored-by: reidliu41 <[email protected]> Co-authored-by: Jee Jee Li <[email protected]> Co-authored-by: 汪志鹏 <[email protected]> Co-authored-by: wang.yuqi <[email protected]> Co-authored-by: 燃 <[email protected]> Co-authored-by: 松灵 <[email protected]> Co-authored-by: Michael Goin <[email protected]> Co-authored-by: Calvin Chen <[email protected]> Co-authored-by: Percy <[email protected]> Co-authored-by: Dilip Gowda Bhagavan <[email protected]> Co-authored-by: bnellnm <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: wwl2755 <[email protected]> Co-authored-by: Nicolò Lucchesi <[email protected]> Co-authored-by: Kebe <[email protected]> Co-authored-by: Yong Hoon Shin <[email protected]> Co-authored-by: Rabi Mishra <[email protected]> Co-authored-by: Dhia Eddine Rhaiem <[email protected]> Co-authored-by: younesbelkada <[email protected]> Co-authored-by: Ilyas Chahed <[email protected]> Co-authored-by: Jingwei Zuo <[email protected]> Co-authored-by: GiantCroc <[email protected]> Co-authored-by: Hyogeun Oh (오효근) <[email protected]> Co-authored-by: Hosang <[email protected]> Co-authored-by: Mark McLoughlin <[email protected]> Co-authored-by: vllmellm <[email protected]> Co-authored-by: Luka Govedič <[email protected]> Co-authored-by: Sebastian Schoennenbeck <[email protected]> Co-authored-by: Ning Xie <[email protected]> Co-authored-by: Russell Bryant <[email protected]> Co-authored-by: youngrok cha <[email protected]> Co-authored-by: Nick Hill <[email protected]> Co-authored-by: kourosh hakhamaneshi <[email protected]> Co-authored-by: Shane A <[email protected]> Co-authored-by: aws-elaineyz <[email protected]> Co-authored-by: Shashwat Srijan <[email protected]> Co-authored-by: Aakash Shetty <[email protected]> Co-authored-by: Tailin Pan <[email protected]> Co-authored-by: Rishabh Rajesh <[email protected]> Co-authored-by: Yishan McNabb <[email protected]> Co-authored-by: Patrick Lange <[email protected]> Co-authored-by: Maxwell Goldberg <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: lkchen <[email protected]> Co-authored-by: Harry Mellor <[email protected]> Co-authored-by: CYJiang <[email protected]> Co-authored-by: Bowen Wang <[email protected]> Co-authored-by: Li, Jiang <[email protected]> Co-authored-by: Lukas Geiger <[email protected]> Co-authored-by: David Xia <[email protected]> Co-authored-by: wangxiyuan <[email protected]> Co-authored-by: Mengqing Cao <[email protected]> Co-authored-by: youkaichao <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]> Co-authored-by: Lucas Wilkinson <[email protected]> Co-authored-by: Ekagra Ranjan <[email protected]> Co-authored-by: Kai Wu <[email protected]> Co-authored-by: Sanger Steel <[email protected]> Co-authored-by: rasmith <[email protected]> Co-authored-by: Chenheli Hua <[email protected]> Co-authored-by: Benjamin Chislett <[email protected]> Co-authored-by: Woosuk Kwon <[email protected]> Co-authored-by: Teruaki Ishizaki <[email protected]> Co-authored-by: Shanshan Shen <[email protected]> Co-authored-by: RonaldBXu <[email protected]> Co-authored-by: cascade <[email protected]> Co-authored-by: Chauncey <[email protected]> Co-authored-by: simon-mo <[email protected]> Co-authored-by: Yuqi Zhang <[email protected]> Co-authored-by: Yuqi Zhang <[email protected]> Co-authored-by: Madeesh Kannan <[email protected]> Co-authored-by: Kay Yan <[email protected]> Co-authored-by: Tristan Leclercq <[email protected]> Co-authored-by: Simon Mo <[email protected]> Co-authored-by: Chen Zhang <[email protected]> Co-authored-by: Jiayi Yao <[email protected]> Co-authored-by: Rui Qiao <[email protected]> Co-authored-by: Huy Do <[email protected]> Co-authored-by: Pavani Majety <[email protected]> Co-authored-by: Feng XiaoLong <[email protected]> Co-authored-by: Crucifixion-Fxl <[email protected]> Co-authored-by: Robert Shaw <[email protected]> Co-authored-by: Mathieu Borderé <[email protected]> Co-authored-by: Wenhua Cheng <[email protected]> Co-authored-by: qizixi <[email protected]> Co-authored-by: Yuanhao WU <[email protected]> Co-authored-by: ztang2370 <[email protected]> Co-authored-by: Aaron Pham <[email protected]> Co-authored-by: Seiji Eicher <[email protected]> Co-authored-by: Chenguang Li <[email protected]> Co-authored-by: Isotr0py <[email protected]> Co-authored-by: AlexZhao <[email protected]> Co-authored-by: zhaohaiyuan <[email protected]> Co-authored-by: Maximilien de Bayser <[email protected]> Co-authored-by: Naveassaf <[email protected]> Co-authored-by: Łukasz Durejko <[email protected]> Co-authored-by: dylan <[email protected]> Co-authored-by: almersawi <[email protected]> Co-authored-by: Islam Almersawi <[email protected]> Co-authored-by: Łukasz Durejko <[email protected]> Co-authored-by: maobaolong <[email protected]> Co-authored-by: Shawn Huang <[email protected]> Co-authored-by: huangyuxiang03 <[email protected]> Co-authored-by: chunxiaozheng <[email protected]>

…s in CI (vllm-project#18705) Signed-off-by: Isotr0py <[email protected]> Signed-off-by: amit <[email protected]>

…s in CI (vllm-project#18705) Signed-off-by: Isotr0py <[email protected]> Signed-off-by: minpeter <[email protected]>

split pooling and generation extended tests in CI

d2716b3

Signed-off-by: Isotr0py <[email protected]>

Isotr0py requested a review from DarkLight1337 May 26, 2025 09:20

mergify bot added the ci/build label May 26, 2025

add time

281401b

Signed-off-by: Isotr0py <[email protected]>

DarkLight1337 approved these changes May 26, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) May 26, 2025 09:24

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label May 26, 2025

DarkLight1337 reviewed May 26, 2025

View reviewed changes

.buildkite/test-pipeline.yaml Outdated Show resolved Hide resolved

update the source file dependencies

4c68c5b

Signed-off-by: Isotr0py <[email protected]>

vllm-bot merged commit 0877750 into vllm-project:main May 26, 2025
76 of 96 checks passed

Isotr0py deleted the split-ci-test branch May 26, 2025 11:03

amitm02 pushed a commit to amitm02/vllm that referenced this pull request Jun 1, 2025

[CI/Build] Split pooling and generation extended language models test…

1827df4

…s in CI (vllm-project#18705) Signed-off-by: Isotr0py <[email protected]> Signed-off-by: amit <[email protected]>

mgoin mentioned this pull request Jun 11, 2025

[CI Failure]: Quantized Models Test - models/quantization/test_gguf.py::test_models[1-5-32-half-model0] #19458

Open

3 tasks

minpeter pushed a commit to minpeter/vllm that referenced this pull request Jun 24, 2025

[CI/Build] Split pooling and generation extended language models test…

f85a70c

…s in CI (vllm-project#18705) Signed-off-by: Isotr0py <[email protected]> Signed-off-by: minpeter <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[CI/Build] Split pooling and generation extended language models tests in CI #18705

[CI/Build] Split pooling and generation extended language models tests in CI #18705

Uh oh!

Isotr0py commented May 26, 2025 •

edited by github-actions bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[CI/Build] Split pooling and generation extended language models tests in CI #18705

[CI/Build] Split pooling and generation extended language models tests in CI #18705

Uh oh!

Conversation

Isotr0py commented May 26, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Isotr0py commented May 26, 2025 •

edited by github-actions bot

Loading