Skip to content

feat: support HF_ENDPOINT environment when downloading model #505

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 26, 2025
Merged

feat: support HF_ENDPOINT environment when downloading model #505

merged 3 commits into from
Mar 26, 2025

Conversation

StrayDragon
Copy link
Contributor

What does this PR do?

Hey! Big thanks for this awesome project. But because of network problems, I've been manually downloading models and mounting them. It's a real pain, especially when I want to try out new models. So, I thought I'd make some changes to make things easier for users like me.

This PR bumps up the hf-hub version (release note) and tweaks the related code to support using the HF_ENDPOINT environment variable. This way, we can use mirrors for downloading, which should speed things up and make it more reliable, especially in bad network env.

By the way, fix #416

Maybe the documentation should also be updated, but I'm not sure if my changes are appropriate to go on...

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Could you help review this PR when you're available?

@OlivierDehaene @Narsil

@StrayDragon StrayDragon changed the title L8ng/feat support hf endpoint feat: support HF_ENDPOINT environment when downloading model Feb 26, 2025
@StrayDragon
Copy link
Contributor Author

@alvarobartt @Narsil @OlivierDehaene
Hey , I found you from the recent commits. Could you have time to review this PR?

really need this convenient feature. Thank you :)

Copy link
Member

@alvarobartt alvarobartt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @StrayDragon, it LGTM, but I'll let @Narsil confirm 🤗

Just note that the latest version of hf-hub is 0.4.2 not, 0.4.1 (see https://crates.io/crates/hf-hub/0.4.2).

@Narsil
Copy link
Collaborator

Narsil commented Mar 26, 2025

Thanks this is a great change. I think it's mostly up for rebase.

I prepared the fixes for the rebase if you want to go faster: #536 (Ideally we prefer merging your PR, it doesn't change attribution at all but it's nicer :) ).

Cheers.

@StrayDragon
Copy link
Contributor Author

StrayDragon commented Mar 26, 2025

@Narsil Cool, this quick fix branch seems to be exactly the same as the result after my rebase... I was just focusing on manually testing some cases. Since I'm just a beginner in Rust, I didn't expect the update to be completed already. Thanks to the contributor and your quick response!

some context for changes
  1. upgrade to hf-hub 0.4.2 , using cargo-edit
$ cargo upgrade -p [email protected]
    Checking virtual workspace's dependencies
name   old req compatible latest new req
====   ======= ========== ====== =======
hf-hub 0.4.1   0.4.2      0.4.2  0.4.2
    Checking backend-grpc-client's dependencies
    Checking grpc-metadata's dependencies
    Checking text-embeddings-backend's dependencies
    Checking text-embeddings-backend-candle's dependencies
    Checking text-embeddings-backend-core's dependencies
    Checking text-embeddings-backend-ort's dependencies
    Checking text-embeddings-backend-python's dependencies
    Checking text-embeddings-core's dependencies
    Checking text-embeddings-router's dependencies
   Upgrading git dependencies
     Locking 0 packages to latest compatible versions
note: pass `--verbose` to see 80 unchanged dependencies behind latest
   Upgrading recursive dependencies
note: Re-run with `--verbose` to show more dependencies
  excluded: 68 packages
  1. build release target by cmd: (only cpu available)
$ cargo clean
$ cargo build --release --bin text-embeddings-router -F candle
  1. run server using this pr new change
$ rm -rf ~/.cache/huggingface/hub/models--thenlper--gte-small/  # remove cache from hf to see check new feat
$ # or use
$ # huggingface-cli delete-cache
  • embed
HF_ENDPOINT=https://hf-mirror.com $EXEC_PATH/text-embeddings-router --model-id thenlper/gte-small --port 8080
See logs
2025-03-26T15:05:14.246471Z  INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "the*****/***-**all", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "0.0.0.0", port: 8080, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: None, payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }
2025-03-26T15:05:14.263368Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:20: Starting download
2025-03-26T15:05:14.263379Z  INFO download_artifacts:download_pool_config: text_embeddings_core::download: core/src/download.rs:53: Downloading `1_Pooling/config.json`
2025-03-26T15:05:16.249305Z  INFO download_artifacts:download_new_st_config: text_embeddings_core::download: core/src/download.rs:77: Downloading `config_sentence_transformers.json`
2025-03-26T15:05:16.579237Z  WARN download_artifacts: text_embeddings_core::download: core/src/download.rs:36: Download failed: request error: HTTP status client error (404 Not Found) for url (https://hf-mirror.com/thenlper/gte-small/resolve/main/config_sentence_transformers.json)
2025-03-26T15:05:16.579271Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:40: Downloading `config.json`
2025-03-26T15:05:17.223200Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:43: Downloading `tokenizer.json`
2025-03-26T15:05:19.172297Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:47: Model artifacts downloaded in 4.908927296s
2025-03-26T15:05:19.187533Z  INFO text_embeddings_router: router/src/lib.rs:188: Maximum number of tokens per request: 512
2025-03-26T15:05:19.187694Z  INFO text_embeddings_core::tokenization: core/src/tokenization.rs:28: Starting 16 tokenization workers
2025-03-26T15:05:19.244116Z  INFO text_embeddings_router: router/src/lib.rs:230: Starting model backend
2025-03-26T15:05:19.244897Z  INFO text_embeddings_backend: backends/src/lib.rs:486: Downloading `model.safetensors`
2025-03-26T15:06:19.734710Z  INFO text_embeddings_backend: backends/src/lib.rs:370: Model weights downloaded in 60.489809071s
2025-03-26T15:06:19.737613Z  INFO text_embeddings_backend_candle: backends/candle/src/lib.rs:199: Starting Bert model on Cpu
2025-03-26T15:06:19.765132Z  WARN text_embeddings_router: router/src/lib.rs:258: Backend does not support a batch size > 4
2025-03-26T15:06:19.765148Z  WARN text_embeddings_router: router/src/lib.rs:259: forcing `max_batch_requests=4`
2025-03-26T15:06:19.768459Z  INFO text_embeddings_router::http::server: router/src/http/server.rs:1804: Starting HTTP server: 0.0.0.0:8080
2025-03-26T15:06:19.768469Z  INFO text_embeddings_router::http::server: router/src/http/server.rs:1805: Ready
52025-03-26T15:09:06.706124Z  INFO embed{total_time="21.256997ms" tokenization_time="195.734µs" queue_time="226.822µs" inference_time="20.771174ms"}: text_embeddings_router::http::server: router/src/http/server.rs:712: Success
  1. run server normal

already installed models in .idea/

  • embed
$EXEC_PATH/text-embeddings-router --model-id .idea/bge-small-en-v1.5 --port 8080

-rerank

$EXEC_PATH/text-embeddings-router --model-id .idea/bge-reranker-base/ --port 8081
  1. test step {4,5} by curl
curl 127.0.0.1:8081/rerank \
    -X POST \
    -d '{"query": "What is Deep Learning?", "texts": ["Deep Learning is not...", "Deep learning is..."]}' \
    -H 'Content-Type: application/json'

curl 127.0.0.1:8080/embed \
    -X POST \
    -d '{"inputs":"What is Deep Learning?"}' \
    -H 'Content-Type: application/json'

Copy link
Collaborator

@Narsil Narsil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for this.

@Narsil Narsil merged commit 2743296 into huggingface:main Mar 26, 2025
1 of 13 checks passed
@StrayDragon StrayDragon deleted the l8ng/feat-support-hf-endpoint branch March 28, 2025 09:13
@cybermanhao
Copy link

cybermanhao commented May 20, 2025

seems doesnt work now

            raise Exception("Simulating official loading failure")  # Simulate official loading failure
        except Exception as hf_error:
            print(f"[WARNING] Unable to load model from Hugging Face official: {type(hf_error).__name__}, {hf_error}")
            print("[DEBUG] Switching to backup mirror for loading...")

            os.environ["HF_ENDPOINT"] = "https://hf-mirror.com"
            try:
                self.embeddings = HuggingFaceEmbeddings(
                    model_name="BAAI/bge-small-zh",
                    encode_kwargs={'normalize_embeddings': True}
                )

            except Exception as mirror_error:
                print(f"[CRITICAL] Unable to load model from backup mirror: {type(mirror_error).__name__}, {mirror_error}")
                raise RuntimeError(f"Model loading failed: {mirror_error}")

U4PJZTU N~53`C({}4GIQ9T

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support env HF_ENDPOINT?
4 participants