Qwen3: Error: Model backend is not healthy

### System Info

```shell
$ docker compose run --rm -it --entrypoint "" tei nvidia-smi
[+] Running 1/1
 ✔ tei Pulled                                                                                                                  2.0s 
Tue Jun 10 19:26:48 2025       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.247.01             Driver Version: 535.247.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3060        Off | 00000000:01:00.0 Off |                  N/A |
|  0%   32C    P8              11W / 170W |      3MiB / 12288MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+
```

```shell
tei-1  | 2025-06-10T19:13:20.835081Z  INFO text_embeddings_router: router/src/main.rs:189: Args { model_id: "Qwe*/*****-*********-0.6B", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 256, max_batch_tokens: 32768, max_batch_requests: Some(64), max_client_batch_size: 64, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hf_token: None, hostname: "8ecd7eb580ab", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, disable_spans: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", prometheus_port: 9000, cors_allow_origin: None }
tei-1  | 2025-06-10T19:13:20.911569Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:20: Starting download
tei-1  | 2025-06-10T19:13:20.911579Z  INFO download_artifacts:download_pool_config: text_embeddings_core::download: core/src/download.rs:53: Downloading `1_Pooling/config.json`
tei-1  | 2025-06-10T19:13:23.031237Z  INFO download_artifacts:download_new_st_config: text_embeddings_core::download: core/src/download.rs:77: Downloading `config_sentence_transformers.json`
tei-1  | 2025-06-10T19:13:23.031271Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:40: Downloading `config.json`
tei-1  | 2025-06-10T19:13:23.031291Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:43: Downloading `tokenizer.json`
tei-1  | 2025-06-10T19:13:23.031327Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:47: Model artifacts downloaded in 2.119759504s
tei-1  | 2025-06-10T19:13:23.307438Z  WARN text_embeddings_router: router/src/lib.rs:189: Could not find a Sentence Transformers config
tei-1  | 2025-06-10T19:13:23.307447Z  INFO text_embeddings_router: router/src/lib.rs:193: Maximum number of tokens per request: 32768
tei-1  | 2025-06-10T19:13:23.307548Z  INFO text_embeddings_core::tokenization: core/src/tokenization.rs:38: Starting 12 tokenization workers
tei-1  | 2025-06-10T19:13:23.654310Z  INFO text_embeddings_router: router/src/lib.rs:235: Starting model backend
tei-1  | 2025-06-10T19:13:23.654929Z  INFO text_embeddings_backend: backends/src/lib.rs:493: Downloading `model.safetensors`
tei-1  | 2025-06-10T19:13:23.655079Z  INFO text_embeddings_backend: backends/src/lib.rs:377: Model weights downloaded in 151.594µs
tei-1  | 2025-06-10T19:13:23.782867Z  INFO text_embeddings_backend_candle: backends/candle/src/lib.rs:462: Starting FlashQwen3 model on Cuda(CudaDevice(DeviceId(1)))
tei-1  | Error: Model backend is not healthy
tei-1  | 
tei-1  | Caused by:
tei-1  |     shape mismatch in mul, lhs: [1, 1536], rhs: [1, 3072]
tei-1 exited with code 1
```

## docker-compose.yml

```yml
  tei:
    # image: ghcr.io/huggingface/text-embeddings-inference:86-1.7
    # image: ghcr.io/huggingface/text-embeddings-inference:86-latest
    image: ghcr.io/huggingface/text-embeddings-inference:86-sha-11ffc60
    pull_policy: always
    restart: "no"
    ports:
      - "4242:80"
    volumes:
      - ./data:/data
    command:
      - --model-id
      - Qwen/Qwen3-Embedding-0.6B
      - --max-batch-tokens
      - "32768"
      - --max-batch-requests
      - "64"
      - --max-client-batch-size
      - "64"
      - --max-concurrent-requests
      - "256"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
```



### Information

- [x] Docker
- [ ] The CLI directly

### Tasks

- [ ] An officially supported command
- [ ] My own modifications

### Reproduction

1. Use `Qwen/Qwen3-Embedding-0.6B` to run the latest code which includes PR https://github.com/huggingface/text-embeddings-inference/pull/627
2. Please note the error when the server is loading

### Expected behavior

Successful loading

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Qwen3: Error: Model backend is not healthy #629

System Info

docker-compose.yml

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Qwen3: Error: Model backend is not healthy #629

Description

System Info

docker-compose.yml

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions