-
Notifications
You must be signed in to change notification settings - Fork 296
Closed
Description
System Info
$ docker compose run --rm -it --entrypoint "" tei nvidia-smi
[+] Running 1/1
✔ tei Pulled 2.0s
Tue Jun 10 19:26:48 2025
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.247.01 Driver Version: 535.247.01 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3060 Off | 00000000:01:00.0 Off | N/A |
| 0% 32C P8 11W / 170W | 3MiB / 12288MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
tei-1 | 2025-06-10T19:13:20.835081Z INFO text_embeddings_router: router/src/main.rs:189: Args { model_id: "Qwe*/*****-*********-0.6B", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 256, max_batch_tokens: 32768, max_batch_requests: Some(64), max_client_batch_size: 64, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hf_token: None, hostname: "8ecd7eb580ab", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, disable_spans: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", prometheus_port: 9000, cors_allow_origin: None }
tei-1 | 2025-06-10T19:13:20.911569Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:20: Starting download
tei-1 | 2025-06-10T19:13:20.911579Z INFO download_artifacts:download_pool_config: text_embeddings_core::download: core/src/download.rs:53: Downloading `1_Pooling/config.json`
tei-1 | 2025-06-10T19:13:23.031237Z INFO download_artifacts:download_new_st_config: text_embeddings_core::download: core/src/download.rs:77: Downloading `config_sentence_transformers.json`
tei-1 | 2025-06-10T19:13:23.031271Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:40: Downloading `config.json`
tei-1 | 2025-06-10T19:13:23.031291Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:43: Downloading `tokenizer.json`
tei-1 | 2025-06-10T19:13:23.031327Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:47: Model artifacts downloaded in 2.119759504s
tei-1 | 2025-06-10T19:13:23.307438Z WARN text_embeddings_router: router/src/lib.rs:189: Could not find a Sentence Transformers config
tei-1 | 2025-06-10T19:13:23.307447Z INFO text_embeddings_router: router/src/lib.rs:193: Maximum number of tokens per request: 32768
tei-1 | 2025-06-10T19:13:23.307548Z INFO text_embeddings_core::tokenization: core/src/tokenization.rs:38: Starting 12 tokenization workers
tei-1 | 2025-06-10T19:13:23.654310Z INFO text_embeddings_router: router/src/lib.rs:235: Starting model backend
tei-1 | 2025-06-10T19:13:23.654929Z INFO text_embeddings_backend: backends/src/lib.rs:493: Downloading `model.safetensors`
tei-1 | 2025-06-10T19:13:23.655079Z INFO text_embeddings_backend: backends/src/lib.rs:377: Model weights downloaded in 151.594µs
tei-1 | 2025-06-10T19:13:23.782867Z INFO text_embeddings_backend_candle: backends/candle/src/lib.rs:462: Starting FlashQwen3 model on Cuda(CudaDevice(DeviceId(1)))
tei-1 | Error: Model backend is not healthy
tei-1 |
tei-1 | Caused by:
tei-1 | shape mismatch in mul, lhs: [1, 1536], rhs: [1, 3072]
tei-1 exited with code 1
docker-compose.yml
tei:
# image: ghcr.io/huggingface/text-embeddings-inference:86-1.7
# image: ghcr.io/huggingface/text-embeddings-inference:86-latest
image: ghcr.io/huggingface/text-embeddings-inference:86-sha-11ffc60
pull_policy: always
restart: "no"
ports:
- "4242:80"
volumes:
- ./data:/data
command:
- --model-id
- Qwen/Qwen3-Embedding-0.6B
- --max-batch-tokens
- "32768"
- --max-batch-requests
- "64"
- --max-client-batch-size
- "64"
- --max-concurrent-requests
- "256"
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
Information
- Docker
- The CLI directly
Tasks
- An officially supported command
- My own modifications
Reproduction
- Use
Qwen/Qwen3-Embedding-0.6B
to run the latest code which includes PR AddQwen3Model
#627 - Please note the error when the server is loading
Expected behavior
Successful loading
rochmad-saputra
Metadata
Metadata
Assignees
Labels
No labels