Skip to content

Qwen3: Error: Model backend is not healthy #629

@gardner

Description

@gardner

System Info

$ docker compose run --rm -it --entrypoint "" tei nvidia-smi
[+] Running 1/1
 ✔ tei Pulled                                                                                                                  2.0s 
Tue Jun 10 19:26:48 2025       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.247.01             Driver Version: 535.247.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3060        Off | 00000000:01:00.0 Off |                  N/A |
|  0%   32C    P8              11W / 170W |      3MiB / 12288MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+
tei-1  | 2025-06-10T19:13:20.835081Z  INFO text_embeddings_router: router/src/main.rs:189: Args { model_id: "Qwe*/*****-*********-0.6B", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 256, max_batch_tokens: 32768, max_batch_requests: Some(64), max_client_batch_size: 64, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hf_token: None, hostname: "8ecd7eb580ab", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, disable_spans: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", prometheus_port: 9000, cors_allow_origin: None }
tei-1  | 2025-06-10T19:13:20.911569Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:20: Starting download
tei-1  | 2025-06-10T19:13:20.911579Z  INFO download_artifacts:download_pool_config: text_embeddings_core::download: core/src/download.rs:53: Downloading `1_Pooling/config.json`
tei-1  | 2025-06-10T19:13:23.031237Z  INFO download_artifacts:download_new_st_config: text_embeddings_core::download: core/src/download.rs:77: Downloading `config_sentence_transformers.json`
tei-1  | 2025-06-10T19:13:23.031271Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:40: Downloading `config.json`
tei-1  | 2025-06-10T19:13:23.031291Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:43: Downloading `tokenizer.json`
tei-1  | 2025-06-10T19:13:23.031327Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:47: Model artifacts downloaded in 2.119759504s
tei-1  | 2025-06-10T19:13:23.307438Z  WARN text_embeddings_router: router/src/lib.rs:189: Could not find a Sentence Transformers config
tei-1  | 2025-06-10T19:13:23.307447Z  INFO text_embeddings_router: router/src/lib.rs:193: Maximum number of tokens per request: 32768
tei-1  | 2025-06-10T19:13:23.307548Z  INFO text_embeddings_core::tokenization: core/src/tokenization.rs:38: Starting 12 tokenization workers
tei-1  | 2025-06-10T19:13:23.654310Z  INFO text_embeddings_router: router/src/lib.rs:235: Starting model backend
tei-1  | 2025-06-10T19:13:23.654929Z  INFO text_embeddings_backend: backends/src/lib.rs:493: Downloading `model.safetensors`
tei-1  | 2025-06-10T19:13:23.655079Z  INFO text_embeddings_backend: backends/src/lib.rs:377: Model weights downloaded in 151.594µs
tei-1  | 2025-06-10T19:13:23.782867Z  INFO text_embeddings_backend_candle: backends/candle/src/lib.rs:462: Starting FlashQwen3 model on Cuda(CudaDevice(DeviceId(1)))
tei-1  | Error: Model backend is not healthy
tei-1  | 
tei-1  | Caused by:
tei-1  |     shape mismatch in mul, lhs: [1, 1536], rhs: [1, 3072]
tei-1 exited with code 1

docker-compose.yml

  tei:
    # image: ghcr.io/huggingface/text-embeddings-inference:86-1.7
    # image: ghcr.io/huggingface/text-embeddings-inference:86-latest
    image: ghcr.io/huggingface/text-embeddings-inference:86-sha-11ffc60
    pull_policy: always
    restart: "no"
    ports:
      - "4242:80"
    volumes:
      - ./data:/data
    command:
      - --model-id
      - Qwen/Qwen3-Embedding-0.6B
      - --max-batch-tokens
      - "32768"
      - --max-batch-requests
      - "64"
      - --max-client-batch-size
      - "64"
      - --max-concurrent-requests
      - "256"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

  1. Use Qwen/Qwen3-Embedding-0.6B to run the latest code which includes PR Add Qwen3Model #627
  2. Please note the error when the server is loading

Expected behavior

Successful loading

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions