-
Notifications
You must be signed in to change notification settings - Fork 296
feat: support HF_ENDPOINT environment when downloading model #505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support HF_ENDPOINT environment when downloading model #505
Conversation
@alvarobartt @Narsil @OlivierDehaene really need this convenient feature. Thank you :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @StrayDragon, it LGTM, but I'll let @Narsil confirm 🤗
Just note that the latest version of hf-hub
is 0.4.2 not, 0.4.1 (see https://crates.io/crates/hf-hub/0.4.2).
Thanks this is a great change. I think it's mostly up for rebase. I prepared the fixes for the rebase if you want to go faster: #536 (Ideally we prefer merging your PR, it doesn't change attribution at all but it's nicer :) ). Cheers. |
@Narsil Cool, this quick fix branch seems to be exactly the same as the result after my rebase... I was just focusing on manually testing some cases. Since I'm just a beginner in Rust, I didn't expect the update to be completed already. Thanks to the contributor and your quick response! some context for changes
$ cargo upgrade -p [email protected]
Checking virtual workspace's dependencies
name old req compatible latest new req
==== ======= ========== ====== =======
hf-hub 0.4.1 0.4.2 0.4.2 0.4.2
Checking backend-grpc-client's dependencies
Checking grpc-metadata's dependencies
Checking text-embeddings-backend's dependencies
Checking text-embeddings-backend-candle's dependencies
Checking text-embeddings-backend-core's dependencies
Checking text-embeddings-backend-ort's dependencies
Checking text-embeddings-backend-python's dependencies
Checking text-embeddings-core's dependencies
Checking text-embeddings-router's dependencies
Upgrading git dependencies
Locking 0 packages to latest compatible versions
note: pass `--verbose` to see 80 unchanged dependencies behind latest
Upgrading recursive dependencies
note: Re-run with `--verbose` to show more dependencies
excluded: 68 packages
$ cargo clean
$ cargo build --release --bin text-embeddings-router -F candle
$ rm -rf ~/.cache/huggingface/hub/models--thenlper--gte-small/ # remove cache from hf to see check new feat
$ # or use
$ # huggingface-cli delete-cache
HF_ENDPOINT=https://hf-mirror.com $EXEC_PATH/text-embeddings-router --model-id thenlper/gte-small --port 8080 See logs2025-03-26T15:05:14.246471Z INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "the*****/***-**all", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "0.0.0.0", port: 8080, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: None, payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }
2025-03-26T15:05:14.263368Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:20: Starting download
2025-03-26T15:05:14.263379Z INFO download_artifacts:download_pool_config: text_embeddings_core::download: core/src/download.rs:53: Downloading `1_Pooling/config.json`
2025-03-26T15:05:16.249305Z INFO download_artifacts:download_new_st_config: text_embeddings_core::download: core/src/download.rs:77: Downloading `config_sentence_transformers.json`
2025-03-26T15:05:16.579237Z WARN download_artifacts: text_embeddings_core::download: core/src/download.rs:36: Download failed: request error: HTTP status client error (404 Not Found) for url (https://hf-mirror.com/thenlper/gte-small/resolve/main/config_sentence_transformers.json)
2025-03-26T15:05:16.579271Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:40: Downloading `config.json`
2025-03-26T15:05:17.223200Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:43: Downloading `tokenizer.json`
2025-03-26T15:05:19.172297Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:47: Model artifacts downloaded in 4.908927296s
2025-03-26T15:05:19.187533Z INFO text_embeddings_router: router/src/lib.rs:188: Maximum number of tokens per request: 512
2025-03-26T15:05:19.187694Z INFO text_embeddings_core::tokenization: core/src/tokenization.rs:28: Starting 16 tokenization workers
2025-03-26T15:05:19.244116Z INFO text_embeddings_router: router/src/lib.rs:230: Starting model backend
2025-03-26T15:05:19.244897Z INFO text_embeddings_backend: backends/src/lib.rs:486: Downloading `model.safetensors`
2025-03-26T15:06:19.734710Z INFO text_embeddings_backend: backends/src/lib.rs:370: Model weights downloaded in 60.489809071s
2025-03-26T15:06:19.737613Z INFO text_embeddings_backend_candle: backends/candle/src/lib.rs:199: Starting Bert model on Cpu
2025-03-26T15:06:19.765132Z WARN text_embeddings_router: router/src/lib.rs:258: Backend does not support a batch size > 4
2025-03-26T15:06:19.765148Z WARN text_embeddings_router: router/src/lib.rs:259: forcing `max_batch_requests=4`
2025-03-26T15:06:19.768459Z INFO text_embeddings_router::http::server: router/src/http/server.rs:1804: Starting HTTP server: 0.0.0.0:8080
2025-03-26T15:06:19.768469Z INFO text_embeddings_router::http::server: router/src/http/server.rs:1805: Ready
52025-03-26T15:09:06.706124Z INFO embed{total_time="21.256997ms" tokenization_time="195.734µs" queue_time="226.822µs" inference_time="20.771174ms"}: text_embeddings_router::http::server: router/src/http/server.rs:712: Success
already installed models in
$EXEC_PATH/text-embeddings-router --model-id .idea/bge-small-en-v1.5 --port 8080 -rerank $EXEC_PATH/text-embeddings-router --model-id .idea/bge-reranker-base/ --port 8081
curl 127.0.0.1:8081/rerank \
-X POST \
-d '{"query": "What is Deep Learning?", "texts": ["Deep Learning is not...", "Deep learning is..."]}' \
-H 'Content-Type: application/json'
curl 127.0.0.1:8080/embed \
-X POST \
-d '{"inputs":"What is Deep Learning?"}' \
-H 'Content-Type: application/json' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for this.
seems doesnt work now
|
What does this PR do?
Hey! Big thanks for this awesome project. But because of network problems, I've been manually downloading models and mounting them. It's a real pain, especially when I want to try out new models. So, I thought I'd make some changes to make things easier for users like me.
This PR bumps up the
hf-hub
version (release note) and tweaks the related code to support using theHF_ENDPOINT
environment variable. This way, we can use mirrors for downloading, which should speed things up and make it more reliable, especially in bad network env.By the way, fix #416
Maybe the documentation should also be updated, but I'm not sure if my changes are appropriate to go on...
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Could you help review this PR when you're available?
@OlivierDehaene @Narsil