-
Notifications
You must be signed in to change notification settings - Fork 302
Description
System Info
Hello,
When attempting to deploy TEI 1.6.1 images on AWS Sagemaker GPU endpoints (e.g ml.g5.2xlarge), various errors led to a failed deployment , as summarized by the following CloudWatch logs:
ghcr.io/huggingface/text-embeddings-inference:cuda-1.6.1
andghcr.io/huggingface/text-embeddings-inference:cuda-sha-7d4d9ec
./entrypoint.sh: line 10: [: -eq: unary operator expected
./entrypoint.sh: line 13: [: too many arguments
./entrypoint.sh: line 16: [: -eq: unary operator expected
cuda compute cap is not supported
- ghcr.io/huggingface/text-embeddings-inference:86-1.6.1
error:unexpected argument 'serve' found
Usage: text-embeddings-router [OPTIONS]
For more information, try '--help'.
Each deployment referenced model artifacts from jinaai/jina-embeddings-v2-small-en supplied as an S3 archive to different deployment strategies leveraging: (1) HuggingFaceModel
and (2) Sagemaker Model
and associated Endpoint Config
created with a boto3 Sagemaker client. Similarly, the deployment failed when instructing the endpoint to fetch model artifacts from the hub (approach using HuggingFaceModel
)
Remark: When deploying TEI 1.4.0 using the HuggingFaceModel approach, asa retrieved by the following code:
from sagemaker.huggingface import get_huggingface_llm_image_uri
tei_image_uri = get_huggingface_llm_image_uri("huggingface-tei", version="1.4.0")
the process completes without errors as long as model artifacts are fetched from the hub. When supplying model artifacts in an S3 archive, deployment fails due to the incorrect backend being initialized, as discussed here and addressed in #559
Information
- Docker
- The CLI directly
Tasks
- An officially supported command
- My own modifications
Reproduction
Deployment instructions using HuggingFaceModel
:
- With model artifacts fetched from the hub
tei_image_uri = <image_uri>
emb_model = HuggingFaceModel(
name="my-tei-model",
role=role,
#model_data=<s3_path_to_optional_model_artifacts),
sagemaker_session= <sm_session>,
image_uri=tei_image_uri,
env={"HF_TASK": "feature-extraction",
"HF_MODEL_ID": "jinaai/jina-embeddings-v2-small-en",
},
)
emb_predictor = emb_model.deploy(
initial_instance_count=1,
instance_type="ml.g5.2xlarge",
endpoint_name="jina-embeddings-tei"
)
- With model artifacts stored in S3:
Modify the above code such that:
- model_data points to an S3 tar.gz. archive storing model artifacts
- HF_MODEL_ID points to /opt/ml/model
Expected behavior
Deployment completes without errors