feat: Amazon SageMaker compatible images #103

JGalego · 2023-12-04T10:25:53Z

What does this PR do?

Adds support for Amazon SageMaker compatible images. Similar to huggingface/text-generation-inference#147, only for TEI.
Mostly CI stuff and some hacks, since the Amazon SageMaker routes are already implemented.

Implementation:

Added sagemaker target to Dockerfile-cuda and custom entrypoint
Added build-and-push-sagemaker-image step to build_* workflows

Who can review?

@OlivierDehaene

+ Added sagemaker target to Dockerfile and custom entrypoint + Added build-and-push-sagemaker-image step to build_* workflows

austinmw · 2023-12-06T23:39:18Z

Hi, happen to know if this supports BAAI/bge-reranker-base?

JGalego · 2023-12-07T04:39:14Z

Hi, happen to know if this supports BAAI/bge-reranker-base?

It does (refs/pr/5 as described in https://github.com/huggingface/text-embeddings-inference#sequence-classification-and-re-ranking).

Running the example request in https://github.com/huggingface/text-embeddings-inference#using-re-rankers-models

superlaut · 2024-02-09T13:34:57Z

Hi team, is there an expected date for merging this PR? I would like to use these images :)

andrewrreed · 2024-04-01T16:38:16Z

FYI - here's a working notebook to deploy TEI on SageMaker in the meantime: https://github.com/andrewrreed/hf-notebooks/blob/main/deploy-tei-sagemaker/tei-sagemaker-inference.ipynb

philschmid · 2024-04-02T09:51:29Z

Hey @JGalego,

Thank you for starting this work. We would need to make some changes to match the environment variables, similar to TGI. https://github.com/huggingface/text-generation-inference/blob/main/sagemaker-entrypoint.sh
Do you want to make those changes? If you are busy we can also contribute directly into your branch.

Also due to complexity of SageMaker, we will build 1 GPU container containing all binaries for the target (https://github.com/huggingface/text-embeddings-inference/blob/main/Dockerfile-cuda-all).
Could you make the changes to the dockerfile-cuda-all?

JGalego · 2024-04-02T12:27:22Z

Hey @JGalego,

Thank you for starting this work. We would need to make some changes to match the environment variables, similar to TGI. https://github.com/huggingface/text-generation-inference/blob/main/sagemaker-entrypoint.sh Do you want to make those changes? If you are busy we can also contribute directly into your branch.

Also due to complexity of SageMaker, we will build 1 GPU container containing all binaries for the target (https://github.com/huggingface/text-embeddings-inference/blob/main/Dockerfile-cuda-all). Could you make the changes to the dockerfile-cuda-all?

Hey @philschmid,

I can add the env var mappings for HF_MODEL_ID --> MODEL_ID and HF_MODEL_REVISION --> REVISION just as in TGI - anything else?

Regarding the all_image (#186), should I had 1/ a separate target for SM with a custom sagemaker-cuda-all-entrypoint.sh (basically, cuda-all-entrypoint.sh + env var mappings) , 2/ change the existing entrypoint or 3/ move this discussion to a different PR?

philschmid

Lets start only with the "all" versions.

.github/workflows/build_75.yaml

Dockerfile-cuda

…ated sagemaker entrypoint

philschmid

Thank you for working on this! As soon as this is merged we ll work with the SageMaker to make it officially available in side the sagemaker-sdk.

OlivierDehaene

Thank you for this and sorry for the delay!

austinmw · 2024-06-07T19:38:25Z

@JGalego Hi, I have a question. Does this work when using an S3 path for model artifacts instead of HF Hub, as in HuggingFaceModel(model_data="<s3_path>", ...)?

I get an endpoint error if I don't specify HF_MODEL_ID as an env config, however I thought that model_data will be ignored if I specify this?

Appreciate any advice, thanks!

JGalego · 2024-06-17T09:17:13Z

@JGalego Hi, I have a question. Does this work when using an S3 path for model artifacts instead of HF Hub, as in HuggingFaceModel(model_data="<s3_path>", ...)?

I get an endpoint error if I don't specify HF_MODEL_ID as an env config, however I thought that model_data will be ignored if I specify this?

Appreciate any advice, thanks!

For the SM images, HF_MODEL_ID is mandatory (https://github.com/huggingface/text-embeddings-inference/blob/main/sagemaker-entrypoint-cuda-all.sh#L25). Checking the code (https://github.com/huggingface/text-embeddings-inference/blob/main/router/src/lib.rs#L95) and the docs (https://github.com/huggingface/text-embeddings-inference?tab=readme-ov-file#docker), it only supports model artifacts from the hub.

Options:
      --model-id <MODEL_ID>
          The name of the model to load. Can be a MODEL_ID as listed on <https://hf.co/models> like `thenlper/gte-base`.
          Or it can be a local directory containing the necessary files as saved by `save_pretrained(...)` methods of
          transformers

          [env: MODEL_ID=]
          [default: thenlper/gte-base]

austinmw · 2024-06-20T15:09:45Z

@JGalego Would it be possible to update this so AWS customers can alternatively use S3 for model storage? I can submit a feature request if needed.

philschmid · 2024-06-20T15:16:24Z

Hello @austinmw @JGalego,

This works already. Just set the HF_MODEL_ID to /opt/ml/model. SageMaker loads your model from s3 to /opt/ml/model and TEI starts correctly.

austinmw · 2024-06-20T15:45:32Z

Hi @philschmid,

I set HF_MODEL_ID="/opt/ml/model", but got this deployment error:

Here's my code:

philschmid · 2024-06-20T16:24:46Z

@austinmw This looks more that your artifact is not correctly created. Can you check https://huggingface.co/docs/sagemaker/inference#create-a-model-artifact-for-deployment

austinmw · 2024-06-20T16:42:20Z

@philschmid Just tried with new artifacts using the provided steps and got the same exact error:

# delete my fine-tuned model from s3
aws s3 rm s3://sagemaker-us-east-1-197237853439/models/bge-base-en-v1.5/model.tar.gz

# upload fresh artifacts according to steps
git lfs install
git clone https://huggingface.co/BAAI/bge-base-en-v1.5
cd bge-base-en-v1.5/
tar zcvf model.tar.gz *
aws s3 cp model.tar.gz s3://sagemaker-us-east-1-197237853439/models/bge-base-en-v1.5/model.tar.gz

Is it possible this feature is broken for TEI?

philschmid · 2024-06-20T16:51:25Z

Nope it should work. I am working on a SageMaker end-to-end example with training and deployment and it works. Can you try uploading your model unzipped and then deploy it using the "uncompressed" setting?

s3_path="s3://path/to/ymodel"

model = HuggingFaceModel(
  role=role,
  # path to s3 bucket with model, we are not using a compressed model
  model_data={'S3DataSource':{'S3Uri': s3_path + "/",'S3DataType': 'S3Prefix','CompressionType': 'None'}},
  image_uri=image,
  env=config
)```

austinmw · 2024-06-20T17:58:29Z

That worked, thanks a ton for your help!

Joao Galego and others added 2 commits December 4, 2023 10:12

Added support for SageMaker-compatible TEI images

0f055b2

+ Added sagemaker target to Dockerfile and custom entrypoint + Added build-and-push-sagemaker-image step to build_* workflows

Added missing COPY from http-builder

e4a251d

Merge branch 'main' into feat/sagemaker

3a5ca31

Add env var mappings for model ID and revision

20e5fd3

philschmid reviewed Apr 4, 2024

View reviewed changes

.github/workflows/build_75.yaml Show resolved Hide resolved

Dockerfile-cuda Outdated Show resolved Hide resolved

JGalego and others added 4 commits April 4, 2024 21:52

Merge branch 'huggingface:main' into feat/sagemaker

705df60

Moved sagemaker target (Dockerfile-cuda --> Dockerfile-cuda-all); Upd…

7b32fef

…ated sagemaker entrypoint

Reverted changes to workflow files

43e5a5a

Added build-and-push-sagemaker-image step to workflow build_all

db5dc96

philschmid approved these changes Apr 5, 2024

View reviewed changes

OlivierDehaene changed the base branch from main to dev April 11, 2024 17:01

OlivierDehaene changed the base branch from dev to main April 11, 2024 17:01

OlivierDehaene approved these changes Apr 11, 2024

View reviewed changes

OlivierDehaene merged commit 432448c into huggingface:main Apr 11, 2024

MasakiMu319 pushed a commit to MasakiMu319/text-embeddings-inference that referenced this pull request Nov 27, 2024

feat: Amazon SageMaker compatible images (huggingface#103)

0ce6f2d

aagnone3 pushed a commit to StratisLLC/hf-text-embeddings-inference that referenced this pull request Dec 11, 2024

feat: Amazon SageMaker compatible images (huggingface#103)

d036979

feat: Amazon SageMaker compatible images #103

feat: Amazon SageMaker compatible images #103

Uh oh!

Conversation

JGalego commented Dec 4, 2023

What does this PR do?

Who can review?

Uh oh!

austinmw commented Dec 6, 2023

Uh oh!

JGalego commented Dec 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

superlaut commented Feb 9, 2024

Uh oh!

andrewrreed commented Apr 1, 2024

Uh oh!

philschmid commented Apr 2, 2024

Uh oh!

JGalego commented Apr 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

philschmid left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

philschmid left a comment

Choose a reason for hiding this comment

Uh oh!

OlivierDehaene left a comment

Choose a reason for hiding this comment

Uh oh!

austinmw commented Jun 7, 2024

Uh oh!

JGalego commented Jun 17, 2024

Uh oh!

austinmw commented Jun 20, 2024

Uh oh!

philschmid commented Jun 20, 2024

Uh oh!

austinmw commented Jun 20, 2024

Uh oh!

philschmid commented Jun 20, 2024

Uh oh!

austinmw commented Jun 20, 2024

Uh oh!

philschmid commented Jun 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

austinmw commented Jun 20, 2024

Uh oh!

Uh oh!

JGalego commented Dec 7, 2023 •

edited

Loading

JGalego commented Apr 2, 2024 •

edited

Loading

philschmid commented Jun 20, 2024 •

edited

Loading