Skip to content

[Bug]: Whisper fails to load with transformers v4.53.0 #20224

@russellb

Description

@russellb

Your current environment

vllm main @ 7b1895e

🐛 Describe the bug

vllm serve openai/whisper-large-v3

This started failing once transformers v4.53.0 was released.

Traceback (most recent call last):
  File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/rbryant/vllm/vllm/engine/multiprocessing/engine.py", line 460, in run_mp_engine
    raise e from None
  File "/home/rbryant/vllm/vllm/engine/multiprocessing/engine.py", line 446, in run_mp_engine
    engine = MQLLMEngine.from_vllm_config(
  File "/home/rbryant/vllm/vllm/engine/multiprocessing/engine.py", line 133, in from_vllm_config
    return cls(
  File "/home/rbryant/vllm/vllm/engine/multiprocessing/engine.py", line 87, in __init__
    self.engine = LLMEngine(*args, **kwargs)
  File "/home/rbryant/vllm/vllm/engine/llm_engine.py", line 268, in __init__
    self._initialize_kv_caches()
  File "/home/rbryant/vllm/vllm/engine/llm_engine.py", line 413, in _initialize_kv_caches
    self.model_executor.determine_num_available_blocks())
  File "/home/rbryant/vllm/vllm/executor/executor_base.py", line 104, in determine_num_available_blocks
    results = self.collective_rpc("determine_num_available_blocks")
  File "/home/rbryant/vllm/vllm/executor/uniproc_executor.py", line 57, in collective_rpc
    answer = run_method(self.driver_worker, method, args, kwargs)
  File "/home/rbryant/vllm/vllm/utils.py", line 2687, in run_method
    return func(*args, **kwargs)
  File "/home/rbryant/vllm/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/rbryant/vllm/vllm/worker/worker.py", line 256, in determine_num_available_blocks
    self.model_runner.profile_run()
  File "/home/rbryant/vllm/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/rbryant/vllm/vllm/worker/enc_dec_model_runner.py", line 312, in profile_run
    max_mm_tokens = self.mm_registry.get_max_multimodal_tokens(
  File "/home/rbryant/vllm/vllm/multimodal/registry.py", line 183, in get_max_multimodal_tokens
    return sum(self.get_max_tokens_by_modality(model_config).values())
  File "/home/rbryant/vllm/vllm/multimodal/registry.py", line 170, in get_max_tokens_by_modality
    mm_limits = self.get_mm_limits_per_prompt(model_config)
  File "/home/rbryant/vllm/vllm/multimodal/registry.py", line 206, in get_mm_limits_per_prompt
    processor = self.create_processor(model_config, disable_cache=False)
  File "/home/rbryant/vllm/vllm/multimodal/registry.py", line 281, in create_processor
    return factories.build_processor(ctx, cache=cache)
  File "/home/rbryant/vllm/vllm/multimodal/registry.py", line 88, in build_processor
    return self.processor(info, dummy_inputs_builder, cache=cache)
  File "/home/rbryant/vllm/vllm/multimodal/processing.py", line 1152, in __init__
    self.data_parser = self._get_data_parser()
  File "/home/rbryant/vllm/vllm/model_executor/models/whisper.py", line 680, in _get_data_parser
    feature_extractor = self.info.get_feature_extractor()
  File "/home/rbryant/vllm/vllm/model_executor/models/whisper.py", line 643, in get_feature_extractor
    hf_processor = self.get_hf_processor()
  File "/home/rbryant/vllm/vllm/model_executor/models/whisper.py", line 637, in get_hf_processor
    return self.ctx.get_hf_processor(WhisperProcessor)
  File "/home/rbryant/vllm/vllm/inputs/registry.py", line 131, in get_hf_processor
    return super().get_hf_processor(
  File "/home/rbryant/vllm/vllm/inputs/registry.py", line 94, in get_hf_processor
    return cached_processor_from_config(
  File "/home/rbryant/vllm/vllm/transformers_utils/processor.py", line 110, in cached_processor_from_config
    return cached_get_processor(
  File "/home/rbryant/vllm/vllm/transformers_utils/processor.py", line 72, in get_processor
    processor = processor_factory.from_pretrained(
  File "/home/rbryant/vllm/venv/lib/python3.10/site-packages/transformers/processing_utils.py", line 1304, in from_pretrained
    return cls.from_args_and_dict(args, processor_dict, **kwargs)
  File "/home/rbryant/vllm/venv/lib/python3.10/site-packages/transformers/processing_utils.py", line 1105, in from_args_and_dict
    processor = cls(*args, **valid_kwargs)
  File "/home/rbryant/vllm/venv/lib/python3.10/site-packages/transformers/models/whisper/processing_whisper.py", line 41, in __init__
    super().__init__(feature_extractor, tokenizer)
  File "/home/rbryant/vllm/venv/lib/python3.10/site-packages/transformers/processing_utils.py", line 551, in __init__
    self.check_argument_for_proper_class(attribute_name, arg)
  File "/home/rbryant/vllm/venv/lib/python3.10/site-packages/transformers/processing_utils.py", line 569, in check_argument_for_proper_class
    raise TypeError(
TypeError: Received a CachedWhisperTokenizerFast for argument tokenizer, but a WhisperTokenizer was expected.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions