Skip to content

Sentence-Transformers-finetuned jinaai/jina-embeddings-v2-small-en doesn't work #556

@deklanw

Description

@deklanw

System Info

I'm testing deployment on HF endpoint (specifically a single L4 machine from AWS)

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

Deploying jinaai/jina-embeddings-v2-small-en on HF endpoint with TEI works fine.

Opening it in SentenceTransformers, saving it, then deploying it on HF endpoint with TEI doesn't work.

from sentence_transformers import SentenceTransformer
model = SentenceTransformer(
    "jinaai/jina-embeddings-v2-small-en", trust_remote_code=True
)
model.push_to_hub("borgcollectivegmbh/testing-jina-stuff")

Deploying this model I pushed fails with

[Server message]Endpoint failed to start
Exit code: 1. Reason: {"timestamp":"2025-04-03T23:18:41.962397Z","level":"INFO","message":"Args { model_id: \"/rep****ory\", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: \"r-borgcollectivegmbh-testing-jina-stuff-cvy-q6pyonpo-18c10-h6i8\", port: 80, uds_path: \"/tmp/text-embeddings-inference-server\", huggingface_hub_cache: Some(\"/repository/cache\"), payload_limit: 2000000, api_key: None, json_output: true, otlp_endpoint: None, otlp_service_name: \"text-embeddings-inference.server\", cors_allow_origin: None }","target":"text_embeddings_router","filename":"router/src/main.rs","line_number":175}
{"timestamp":"2025-04-03T23:18:41.971108Z","level":"INFO","message":"Maximum number of tokens per request: 8192","target":"text_embeddings_router","filename":"router/src/lib.rs","line_number":188}
{"timestamp":"2025-04-03T23:18:41.971307Z","level":"INFO","message":"Starting 7 tokenization workers","target":"text_embeddings_core::tokenization","filename":"core/src/tokenization.rs","line_number":28}
{"timestamp":"2025-04-03T23:18:41.997508Z","level":"INFO","message":"Starting model backend","target":"text_embeddings_router","filename":"router/src/lib.rs","line_number":230}
{"timestamp":"2025-04-03T23:18:42.397739Z","level":"INFO","message":"Starting FlashBert model on Cuda(CudaDevice(DeviceId(1)))","target":"text_embeddings_backend_candle","filename":"backends/candle/src/lib.rs","line_number":258}
{"timestamp":"2025-04-03T23:18:42.398017Z","level":"ERROR","message":"Could not start Candle backend: Could not start backend: FlashBert only supports absolute position embeddings","target":"text_embeddings_backend","filename":"backends/src/lib.rs","line_number":255}
Error: Could not create backend

Caused by:
    Could not start backend: Could not start a suitable backend

You can test yourself, the model I pushed above is public

Expected behavior

That deployment works with jinaai/jina-embeddings-v2-small-en even after finetuning

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions