-
Notifications
You must be signed in to change notification settings - Fork 298
Closed
Description
System Info
I'm testing deployment on HF endpoint (specifically a single L4 machine from AWS)
Information
- Docker
- The CLI directly
Tasks
- An officially supported command
- My own modifications
Reproduction
Deploying jinaai/jina-embeddings-v2-small-en
on HF endpoint with TEI works fine.
Opening it in SentenceTransformers, saving it, then deploying it on HF endpoint with TEI doesn't work.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer(
"jinaai/jina-embeddings-v2-small-en", trust_remote_code=True
)
model.push_to_hub("borgcollectivegmbh/testing-jina-stuff")
Deploying this model I pushed fails with
[Server message]Endpoint failed to start
Exit code: 1. Reason: {"timestamp":"2025-04-03T23:18:41.962397Z","level":"INFO","message":"Args { model_id: \"/rep****ory\", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: \"r-borgcollectivegmbh-testing-jina-stuff-cvy-q6pyonpo-18c10-h6i8\", port: 80, uds_path: \"/tmp/text-embeddings-inference-server\", huggingface_hub_cache: Some(\"/repository/cache\"), payload_limit: 2000000, api_key: None, json_output: true, otlp_endpoint: None, otlp_service_name: \"text-embeddings-inference.server\", cors_allow_origin: None }","target":"text_embeddings_router","filename":"router/src/main.rs","line_number":175}
{"timestamp":"2025-04-03T23:18:41.971108Z","level":"INFO","message":"Maximum number of tokens per request: 8192","target":"text_embeddings_router","filename":"router/src/lib.rs","line_number":188}
{"timestamp":"2025-04-03T23:18:41.971307Z","level":"INFO","message":"Starting 7 tokenization workers","target":"text_embeddings_core::tokenization","filename":"core/src/tokenization.rs","line_number":28}
{"timestamp":"2025-04-03T23:18:41.997508Z","level":"INFO","message":"Starting model backend","target":"text_embeddings_router","filename":"router/src/lib.rs","line_number":230}
{"timestamp":"2025-04-03T23:18:42.397739Z","level":"INFO","message":"Starting FlashBert model on Cuda(CudaDevice(DeviceId(1)))","target":"text_embeddings_backend_candle","filename":"backends/candle/src/lib.rs","line_number":258}
{"timestamp":"2025-04-03T23:18:42.398017Z","level":"ERROR","message":"Could not start Candle backend: Could not start backend: FlashBert only supports absolute position embeddings","target":"text_embeddings_backend","filename":"backends/src/lib.rs","line_number":255}
Error: Could not create backend
Caused by:
Could not start backend: Could not start a suitable backend
You can test yourself, the model I pushed above is public
Expected behavior
That deployment works with jinaai/jina-embeddings-v2-small-en
even after finetuning
Metadata
Metadata
Assignees
Labels
No labels