Skip to content

Failing deployment on AWS Sagemaker endpoints #569

@CoolFish88

Description

@CoolFish88

System Info

Hello,

When attempting to deploy TEI 1.6.1 images on AWS Sagemaker GPU endpoints (e.g ml.g5.2xlarge), various errors led to a failed deployment , as summarized by the following CloudWatch logs:

  • ghcr.io/huggingface/text-embeddings-inference:cuda-1.6.1 and ghcr.io/huggingface/text-embeddings-inference:cuda-sha-7d4d9ec

./entrypoint.sh: line 10: [: -eq: unary operator expected
./entrypoint.sh: line 13: [: too many arguments
./entrypoint.sh: line 16: [: -eq: unary operator expected
cuda compute cap is not supported

  • ghcr.io/huggingface/text-embeddings-inference:86-1.6.1

error:unexpected argument 'serve' found
Usage: text-embeddings-router [OPTIONS]
For more information, try '--help'.

Each deployment referenced model artifacts from jinaai/jina-embeddings-v2-small-en supplied as an S3 archive to different deployment strategies leveraging: (1) HuggingFaceModeland (2) Sagemaker Model and associated Endpoint Config created with a boto3 Sagemaker client. Similarly, the deployment failed when instructing the endpoint to fetch model artifacts from the hub (approach using HuggingFaceModel)

Remark: When deploying TEI 1.4.0 using the HuggingFaceModel approach, asa retrieved by the following code:

from sagemaker.huggingface import get_huggingface_llm_image_uri
tei_image_uri = get_huggingface_llm_image_uri("huggingface-tei", version="1.4.0")

the process completes without errors as long as model artifacts are fetched from the hub. When supplying model artifacts in an S3 archive, deployment fails due to the incorrect backend being initialized, as discussed here and addressed in #559

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

Deployment instructions using HuggingFaceModel:

  • With model artifacts fetched from the hub
tei_image_uri = <image_uri>
emb_model = HuggingFaceModel(
    name="my-tei-model",
    role=role,
    #model_data=<s3_path_to_optional_model_artifacts),
    sagemaker_session= <sm_session>,
    image_uri=tei_image_uri,
    env={"HF_TASK": "feature-extraction",
         "HF_MODEL_ID": "jinaai/jina-embeddings-v2-small-en",
    },
)

emb_predictor = emb_model.deploy(
    initial_instance_count=1,
    instance_type="ml.g5.2xlarge",
    endpoint_name="jina-embeddings-tei"
)
  • With model artifacts stored in S3:
Modify the above code such that:
- model_data points to an S3 tar.gz. archive storing model artifacts
- HF_MODEL_ID points to /opt/ml/model

Expected behavior

Deployment completes without errors

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions