Skip to content

Conversation

JGalego
Copy link
Contributor

@JGalego JGalego commented Dec 4, 2023

What does this PR do?

Adds support for Amazon SageMaker compatible images. Similar to huggingface/text-generation-inference#147, only for TEI.
Mostly CI stuff and some hacks, since the Amazon SageMaker routes are already implemented.

Implementation:

  • Added sagemaker target to Dockerfile-cuda and custom entrypoint
  • Added build-and-push-sagemaker-image step to build_* workflows

Who can review?

@OlivierDehaene

Joao Galego and others added 2 commits December 4, 2023 10:12
+ Added sagemaker target to Dockerfile and custom entrypoint
+ Added build-and-push-sagemaker-image step to build_* workflows
@austinmw
Copy link

austinmw commented Dec 6, 2023

Hi, happen to know if this supports BAAI/bge-reranker-base?

@JGalego
Copy link
Contributor Author

JGalego commented Dec 7, 2023

Hi, happen to know if this supports BAAI/bge-reranker-base?

It does (refs/pr/5 as described in https://github.com/huggingface/text-embeddings-inference#sequence-classification-and-re-ranking).

Running the example request in https://github.com/huggingface/text-embeddings-inference#using-re-rankers-models
image

@superlaut
Copy link

Hi team, is there an expected date for merging this PR? I would like to use these images :)

@andrewrreed
Copy link

FYI - here's a working notebook to deploy TEI on SageMaker in the meantime: https://github.com/andrewrreed/hf-notebooks/blob/main/deploy-tei-sagemaker/tei-sagemaker-inference.ipynb

@philschmid
Copy link
Contributor

Hey @JGalego,

Thank you for starting this work. We would need to make some changes to match the environment variables, similar to TGI. https://github.com/huggingface/text-generation-inference/blob/main/sagemaker-entrypoint.sh
Do you want to make those changes? If you are busy we can also contribute directly into your branch.

Also due to complexity of SageMaker, we will build 1 GPU container containing all binaries for the target (https://github.com/huggingface/text-embeddings-inference/blob/main/Dockerfile-cuda-all).
Could you make the changes to the dockerfile-cuda-all?

@JGalego
Copy link
Contributor Author

JGalego commented Apr 2, 2024

Hey @JGalego,

Thank you for starting this work. We would need to make some changes to match the environment variables, similar to TGI. https://github.com/huggingface/text-generation-inference/blob/main/sagemaker-entrypoint.sh Do you want to make those changes? If you are busy we can also contribute directly into your branch.

Also due to complexity of SageMaker, we will build 1 GPU container containing all binaries for the target (https://github.com/huggingface/text-embeddings-inference/blob/main/Dockerfile-cuda-all). Could you make the changes to the dockerfile-cuda-all?

Hey @philschmid,

I can add the env var mappings for HF_MODEL_ID --> MODEL_ID and HF_MODEL_REVISION --> REVISION just as in TGI - anything else?

Regarding the all_image (#186), should I had 1/ a separate target for SM with a custom sagemaker-cuda-all-entrypoint.sh (basically, cuda-all-entrypoint.sh + env var mappings) , 2/ change the existing entrypoint or 3/ move this discussion to a different PR?

Copy link
Contributor

@philschmid philschmid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets start only with the "all" versions.

Copy link
Contributor

@philschmid philschmid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this! As soon as this is merged we ll work with the SageMaker to make it officially available in side the sagemaker-sdk.

@OlivierDehaene OlivierDehaene changed the base branch from main to dev April 11, 2024 17:01
@OlivierDehaene OlivierDehaene changed the base branch from dev to main April 11, 2024 17:01
Copy link
Contributor

@OlivierDehaene OlivierDehaene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this and sorry for the delay!

@OlivierDehaene OlivierDehaene merged commit 432448c into huggingface:main Apr 11, 2024
@austinmw
Copy link

austinmw commented Jun 7, 2024

@JGalego Hi, I have a question. Does this work when using an S3 path for model artifacts instead of HF Hub, as in HuggingFaceModel(model_data="<s3_path>", ...)?

I get an endpoint error if I don't specify HF_MODEL_ID as an env config, however I thought that model_data will be ignored if I specify this?

Appreciate any advice, thanks!

@JGalego
Copy link
Contributor Author

JGalego commented Jun 17, 2024

@JGalego Hi, I have a question. Does this work when using an S3 path for model artifacts instead of HF Hub, as in HuggingFaceModel(model_data="<s3_path>", ...)?

I get an endpoint error if I don't specify HF_MODEL_ID as an env config, however I thought that model_data will be ignored if I specify this?

Appreciate any advice, thanks!

For the SM images, HF_MODEL_ID is mandatory (https://github.com/huggingface/text-embeddings-inference/blob/main/sagemaker-entrypoint-cuda-all.sh#L25). Checking the code (https://github.com/huggingface/text-embeddings-inference/blob/main/router/src/lib.rs#L95) and the docs (https://github.com/huggingface/text-embeddings-inference?tab=readme-ov-file#docker), it only supports model artifacts from the hub.

Options:
      --model-id <MODEL_ID>
          The name of the model to load. Can be a MODEL_ID as listed on <https://hf.co/models> like `thenlper/gte-base`.
          Or it can be a local directory containing the necessary files as saved by `save_pretrained(...)` methods of
          transformers

          [env: MODEL_ID=]
          [default: thenlper/gte-base]

@austinmw
Copy link

@JGalego Would it be possible to update this so AWS customers can alternatively use S3 for model storage? I can submit a feature request if needed.

@philschmid
Copy link
Contributor

Hello @austinmw @JGalego,

This works already. Just set the HF_MODEL_ID to /opt/ml/model. SageMaker loads your model from s3 to /opt/ml/model and TEI starts correctly.

@austinmw
Copy link

Hi @philschmid,

I set HF_MODEL_ID="/opt/ml/model", but got this deployment error:
image

Here's my code:

image
image

@philschmid
Copy link
Contributor

@austinmw This looks more that your artifact is not correctly created. Can you check https://huggingface.co/docs/sagemaker/inference#create-a-model-artifact-for-deployment

@austinmw
Copy link

@philschmid Just tried with new artifacts using the provided steps and got the same exact error:

# delete my fine-tuned model from s3
aws s3 rm s3://sagemaker-us-east-1-197237853439/models/bge-base-en-v1.5/model.tar.gz

# upload fresh artifacts according to steps
git lfs install
git clone https://huggingface.co/BAAI/bge-base-en-v1.5
cd bge-base-en-v1.5/
tar zcvf model.tar.gz *
aws s3 cp model.tar.gz s3://sagemaker-us-east-1-197237853439/models/bge-base-en-v1.5/model.tar.gz

image

Is it possible this feature is broken for TEI?

@philschmid
Copy link
Contributor

philschmid commented Jun 20, 2024

Nope it should work. I am working on a SageMaker end-to-end example with training and deployment and it works. Can you try uploading your model unzipped and then deploy it using the "uncompressed" setting?

s3_path="s3://path/to/ymodel"

model = HuggingFaceModel(
  role=role,
  # path to s3 bucket with model, we are not using a compressed model
  model_data={'S3DataSource':{'S3Uri': s3_path + "/",'S3DataType': 'S3Prefix','CompressionType': 'None'}},
  image_uri=image,
  env=config
)```

@austinmw
Copy link

That worked, thanks a ton for your help!

MasakiMu319 pushed a commit to MasakiMu319/text-embeddings-inference that referenced this pull request Nov 27, 2024
aagnone3 pushed a commit to StratisLLC/hf-text-embeddings-inference that referenced this pull request Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants