feat: support HF_ENDPOINT environment when downloading model #505

StrayDragon · 2025-02-26T14:40:52Z

What does this PR do?

Hey! Big thanks for this awesome project. But because of network problems, I've been manually downloading models and mounting them. It's a real pain, especially when I want to try out new models. So, I thought I'd make some changes to make things easier for users like me.

This PR bumps up the hf-hub version (release note) and tweaks the related code to support using the HF_ENDPOINT environment variable. This way, we can use mirrors for downloading, which should speed things up and make it more reliable, especially in bad network env.

By the way, fix #416

Maybe the documentation should also be updated, but I'm not sure if my changes are appropriate to go on...

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Could you help review this PR when you're available?

@OlivierDehaene @Narsil

StrayDragon · 2025-03-26T12:12:24Z

@alvarobartt @Narsil @OlivierDehaene
Hey , I found you from the recent commits. Could you have time to review this PR?

really need this convenient feature. Thank you :)

alvarobartt

Thanks for the PR @StrayDragon, it LGTM, but I'll let @Narsil confirm 🤗

Just note that the latest version of hf-hub is 0.4.2 not, 0.4.1 (see https://crates.io/crates/hf-hub/0.4.2).

Cargo.toml

Narsil · 2025-03-26T15:06:57Z

Thanks this is a great change. I think it's mostly up for rebase.

I prepared the fixes for the rebase if you want to go faster: #536 (Ideally we prefer merging your PR, it doesn't change attribution at all but it's nicer :) ).

Cheers.

StrayDragon · 2025-03-26T15:44:53Z

@Narsil Cool, this quick fix branch seems to be exactly the same as the result after my rebase... I was just focusing on manually testing some cases. Since I'm just a beginner in Rust, I didn't expect the update to be completed already. Thanks to the contributor and your quick response!

some context for changes

upgrade to hf-hub 0.4.2 , using cargo-edit

$ cargo upgrade -p [email protected]
    Checking virtual workspace's dependencies
name   old req compatible latest new req
====   ======= ========== ====== =======
hf-hub 0.4.1   0.4.2      0.4.2  0.4.2
    Checking backend-grpc-client's dependencies
    Checking grpc-metadata's dependencies
    Checking text-embeddings-backend's dependencies
    Checking text-embeddings-backend-candle's dependencies
    Checking text-embeddings-backend-core's dependencies
    Checking text-embeddings-backend-ort's dependencies
    Checking text-embeddings-backend-python's dependencies
    Checking text-embeddings-core's dependencies
    Checking text-embeddings-router's dependencies
   Upgrading git dependencies
     Locking 0 packages to latest compatible versions
note: pass `--verbose` to see 80 unchanged dependencies behind latest
   Upgrading recursive dependencies
note: Re-run with `--verbose` to show more dependencies
  excluded: 68 packages

build release target by cmd: (only cpu available)

$ cargo clean
$ cargo build --release --bin text-embeddings-router -F candle

run server using this pr new change

$ rm -rf ~/.cache/huggingface/hub/models--thenlper--gte-small/  # remove cache from hf to see check new feat
$ # or use
$ # huggingface-cli delete-cache

embed

HF_ENDPOINT=https://hf-mirror.com $EXEC_PATH/text-embeddings-router --model-id thenlper/gte-small --port 8080

See logs

2025-03-26T15:05:14.246471Z  INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "the*****/***-**all", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "0.0.0.0", port: 8080, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: None, payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }
2025-03-26T15:05:14.263368Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:20: Starting download
2025-03-26T15:05:14.263379Z  INFO download_artifacts:download_pool_config: text_embeddings_core::download: core/src/download.rs:53: Downloading `1_Pooling/config.json`
2025-03-26T15:05:16.249305Z  INFO download_artifacts:download_new_st_config: text_embeddings_core::download: core/src/download.rs:77: Downloading `config_sentence_transformers.json`
2025-03-26T15:05:16.579237Z  WARN download_artifacts: text_embeddings_core::download: core/src/download.rs:36: Download failed: request error: HTTP status client error (404 Not Found) for url (https://hf-mirror.com/thenlper/gte-small/resolve/main/config_sentence_transformers.json)
2025-03-26T15:05:16.579271Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:40: Downloading `config.json`
2025-03-26T15:05:17.223200Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:43: Downloading `tokenizer.json`
2025-03-26T15:05:19.172297Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:47: Model artifacts downloaded in 4.908927296s
2025-03-26T15:05:19.187533Z  INFO text_embeddings_router: router/src/lib.rs:188: Maximum number of tokens per request: 512
2025-03-26T15:05:19.187694Z  INFO text_embeddings_core::tokenization: core/src/tokenization.rs:28: Starting 16 tokenization workers
2025-03-26T15:05:19.244116Z  INFO text_embeddings_router: router/src/lib.rs:230: Starting model backend
2025-03-26T15:05:19.244897Z  INFO text_embeddings_backend: backends/src/lib.rs:486: Downloading `model.safetensors`
2025-03-26T15:06:19.734710Z  INFO text_embeddings_backend: backends/src/lib.rs:370: Model weights downloaded in 60.489809071s
2025-03-26T15:06:19.737613Z  INFO text_embeddings_backend_candle: backends/candle/src/lib.rs:199: Starting Bert model on Cpu
2025-03-26T15:06:19.765132Z  WARN text_embeddings_router: router/src/lib.rs:258: Backend does not support a batch size > 4
2025-03-26T15:06:19.765148Z  WARN text_embeddings_router: router/src/lib.rs:259: forcing `max_batch_requests=4`
2025-03-26T15:06:19.768459Z  INFO text_embeddings_router::http::server: router/src/http/server.rs:1804: Starting HTTP server: 0.0.0.0:8080
2025-03-26T15:06:19.768469Z  INFO text_embeddings_router::http::server: router/src/http/server.rs:1805: Ready
52025-03-26T15:09:06.706124Z  INFO embed{total_time="21.256997ms" tokenization_time="195.734µs" queue_time="226.822µs" inference_time="20.771174ms"}: text_embeddings_router::http::server: router/src/http/server.rs:712: Success

run server normal

already installed models in .idea/

embed

$EXEC_PATH/text-embeddings-router --model-id .idea/bge-small-en-v1.5 --port 8080

-rerank

$EXEC_PATH/text-embeddings-router --model-id .idea/bge-reranker-base/ --port 8081

test step {4,5} by curl

curl 127.0.0.1:8081/rerank \
    -X POST \
    -d '{"query": "What is Deep Learning?", "texts": ["Deep Learning is not...", "Deep learning is..."]}' \
    -H 'Content-Type: application/json'

curl 127.0.0.1:8080/embed \
    -X POST \
    -d '{"inputs":"What is Deep Learning?"}' \
    -H 'Content-Type: application/json'

Narsil

LGTM, thanks for this.

cybermanhao · 2025-05-20T13:31:44Z

seems doesnt work now

            raise Exception("Simulating official loading failure")  # Simulate official loading failure
        except Exception as hf_error:
            print(f"[WARNING] Unable to load model from Hugging Face official: {type(hf_error).__name__}, {hf_error}")
            print("[DEBUG] Switching to backup mirror for loading...")

            os.environ["HF_ENDPOINT"] = "https://hf-mirror.com"
            try:
                self.embeddings = HuggingFaceEmbeddings(
                    model_name="BAAI/bge-small-zh",
                    encode_kwargs={'normalize_embeddings': True}
                )

            except Exception as mirror_error:
                print(f"[CRITICAL] Unable to load model from backup mirror: {type(mirror_error).__name__}, {mirror_error}")
                raise RuntimeError(f"Model loading failed: {mirror_error}")

StrayDragon changed the title ~~L8ng/feat support hf endpoint~~ feat: support HF_ENDPOINT environment when downloading model Feb 26, 2025

alvarobartt reviewed Mar 26, 2025

View reviewed changes

Cargo.toml Outdated Show resolved Hide resolved

StrayDragon added 3 commits March 26, 2025 23:54

deps: upgrade hf-hub to 0.4.1

34882b0

feat: support HF_ENDPOINT env by upgraded hf-hub

9fdb42e

build: lock deps

5369135

Narsil approved these changes Mar 26, 2025

View reviewed changes

Narsil merged commit 2743296 into huggingface:main Mar 26, 2025
1 of 13 checks passed

StrayDragon deleted the l8ng/feat-support-hf-endpoint branch March 28, 2025 09:13

BrewTestBot mentioned this pull request Mar 30, 2025

text-embeddings-inference 1.6.1 Homebrew/homebrew-core#217310

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: support HF_ENDPOINT environment when downloading model #505

feat: support HF_ENDPOINT environment when downloading model #505

Uh oh!

StrayDragon commented Feb 26, 2025

Uh oh!

StrayDragon commented Mar 26, 2025

Uh oh!

alvarobartt left a comment •

edited

Loading

Uh oh!

Uh oh!

Narsil commented Mar 26, 2025

Uh oh!

StrayDragon commented Mar 26, 2025 •

edited

Loading

Uh oh!

Narsil left a comment

Uh oh!

Uh oh!

cybermanhao commented May 20, 2025 •

edited

Loading

Uh oh!

Uh oh!

feat: support HF_ENDPOINT environment when downloading model #505

feat: support HF_ENDPOINT environment when downloading model #505

Uh oh!

Conversation

StrayDragon commented Feb 26, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

StrayDragon commented Mar 26, 2025

Uh oh!

alvarobartt left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Narsil commented Mar 26, 2025

Uh oh!

StrayDragon commented Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Narsil left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cybermanhao commented May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

alvarobartt left a comment •

edited

Loading

StrayDragon commented Mar 26, 2025 •

edited

Loading

cybermanhao commented May 20, 2025 •

edited

Loading