Skip to content

Conversation

angt
Copy link
Collaborator

@angt angt commented Sep 22, 2025

This is a draft that uses httplib to download, mostly copied from the existing cURL implementation.
To test, build with -DLLAMA_CURL=OFF.
Some features might be missing for now, but it's a starting point.

@angt angt requested a review from ggerganov as a code owner September 22, 2025 23:17
@angt angt force-pushed the use-cpp-httplib-as-a-curl-alternative-for-downloads branch 3 times, most recently from 0201e99 to e1f545f Compare September 23, 2025 10:49
@angt
Copy link
Collaborator Author

angt commented Sep 24, 2025

This is the one that concerns me, since cpp-httplib is currently a required dependency of llama.cpp:

error: "cpp-httplib doesn't support Windows 8 or lower. Please use Windows 10 or later."

@ggerganov
Copy link
Member

This is the one that concerns me, since cpp-httplib is currently a required dependency of llama.cpp:

error: "cpp-httplib doesn't support Windows 8 or lower. Please use Windows 10 or later."

It shouldn't be too difficult to add LLAMA_HTTPLIB option and do the same thing we do currently on master when LLAMA_CURL=OFF?

Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. The biggest unknown for me is the Windows workflow - building and releases. I suppose whatever we currently do to provide libcurl we have to do for libssl.

If you plan to bring this to completion, feel free to add yourself to the CODEOWNERS. I see the following TODOs:

  • Extract file downloading implementation from common/arg.cpp to common/download.cpp
  • Remove CURL dependency (+ figure out how to build on Windows)
  • Remove json dependency from common/download.cpp
  • Add CMake option to build without httplib for old Windows support

Comment on lines +699 to +710
static void write_metadata(const std::string & path,
const std::string & url,
const common_file_metadata & metadata) {
nlohmann::json metadata_json = {
{ "url", url },
{ "etag", metadata.etag },
{ "lastModified", metadata.last_modified }
};

write_file(path, metadata_json.dump(4));
LOG_DBG("%s: file metadata saved: %s\n", __func__, path.c_str());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as in the previous PR about the json stuff: I hope eventually we will avoid using json for this component - it's a pity we started doing it in the first place.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely can remove json here, in fact we can just read/write the etag it's enough.

@angt
Copy link
Collaborator Author

angt commented Sep 24, 2025

This is the one that concerns me, since cpp-httplib is currently a required dependency of llama.cpp:

error: "cpp-httplib doesn't support Windows 8 or lower. Please use Windows 10 or later."

It shouldn't be too difficult to add LLAMA_HTTPLIB option and do the same thing we do currently on master when LLAMA_CURL=OFF?

The windows issue comes from updating httplib (this PR yhirose/cpp-httplib#2177).
I don’t think keeping the old version would be a good idea, and I don’t believe it’s reasonable to support Windows 8 without llama-server ?

Possible solutions could be either patching httplib to restore Windows 8 compatibility, or switching to another HTTP library.

@ggerganov
Copy link
Member

ggerganov commented Sep 24, 2025

I don’t think keeping the old version would be a good idea

Yes, we should stick with the latest version of httplib.

I don’t believe it’s reasonable to support Windows 8 without llama-server ?

The idea is when LLAMA_HTTPLIB=OFF to build empty download functions that will simply print an error that downloading is not supported. Windows 8 can still run llama-server - it just won't be able to download models.

I suspect that these failing CI workflows are currently happening only for the msys/mingw toolchain. Likely there is a simple fix by tuning the WIN32 preprocessor macros to make httplib happy. Note that the runners are not actually using Windows 8, so it's some sort of mis-detection. Worst case, I think we can safely disable downloading capabilities for these specific builds.

@angt angt requested a review from CISC as a code owner September 25, 2025 07:15
@github-actions github-actions bot added the devops improvements to build systems and github actions label Sep 25, 2025
@ggerganov
Copy link
Member

Regarding 547fa26 - I suppose this is temporary? We want to keep the upstream version unchanged, so any modifications should be first upstreamed to the original repo.

@angt
Copy link
Collaborator Author

angt commented Sep 25, 2025

Regarding 547fa26 - I suppose this is temporary? We want to keep the upstream version unchanged, so any modifications should be first upstreamed to the original repo.

Yes, this was only to confirm that everything builds correctly with it.

@angt
Copy link
Collaborator Author

angt commented Sep 25, 2025

Since cpp-httplib is mandatory for llama-server (with or without the model downloader), we can bump _WIN32_WINNT to 0x0A00 to align with the current restriction.

@ggerganov
Copy link
Member

Since cpp-httplib is mandatory for llama-server

Oh right, I missed that when I wrote the comment earlier.

we can bump _WIN32_WINNT to 0x0A00 to align with the current restriction.

Yes, let's give this a try.

@angt angt force-pushed the use-cpp-httplib-as-a-curl-alternative-for-downloads branch from 1f97bec to aad19ef Compare September 25, 2025 09:42
@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Sep 25, 2025
@angt
Copy link
Collaborator Author

angt commented Sep 25, 2025

Note @ggerganov : I've tested the version including commit 547fa26 (cpp-httplib: allow _WIN32_WINNT >= 0x0602) and it works fine under Wine. There should be no issue retargeting Windows 8 if needed.

$ wine build/bin/llama-server.exe -hf unsloth/Qwen3-4B-Instruct-2507-GGUF:Q4_0
it looks like wine32 is missing, you should install it.
multiarch needs to be enabled first.  as root, please
execute "dpkg --add-architecture i386 && apt-get update &&
apt-get install wine32:i386"
0048:err:winediag:nodrv_CreateWindow Application tried to create a window, but no driver could be loaded.
0048:err:winediag:nodrv_CreateWindow L"The explorer process failed to start."
0048:err:systray:initialize_systray Could not create tray window
common_download_file_single_online: no previous model file found C:\users\angt\AppData\Local\llama.cpp\unsloth_Qwen3-4B
-Instruct-2507-GGUF_Qwen3-4B-Instruct-2507-Q4_0.gguf
common_download_file_single_online: trying to download model from https://huggingface.co/unsloth/Qwen3-4B-Instruct-2507
-GGUF/resolve/main/Qwen3-4B-Instruct-2507-Q4_0.gguf to C:\users\angt\AppData\Local\llama.cpp\unsloth_Qwen3-4B-Instruct-
2507-GGUF_Qwen3-4B-Instruct-2507-Q4_0.gguf.downloadInProgress (server_etag:"aaf2d7f5827dd5d918cc73fefac1d96c704f6b6cd3d
2c36d1e9f5c3ac675d94f", server_last_modified:)...
[>                              ^C                  ]   0%  (15 MB / 2265 MB)

And no issues when linking libssl statically on Windows :)

$ peldd build/bin/llama-server.exe
Dependencies
    ADVAPI32.dll
    bcrypt.dll
    CRYPT32.dll
    KERNEL32.dll
    msvcrt.dll
    WS2_32.dll

@angt angt force-pushed the use-cpp-httplib-as-a-curl-alternative-for-downloads branch from aad19ef to e7b5f55 Compare September 25, 2025 13:51
@ggerganov
Copy link
Member

Hm, the address sanitizer is acting up. Not sure if related though.

@angt
Copy link
Collaborator Author

angt commented Sep 25, 2025

Hm, the address sanitizer is acting up. Not sure if related though.

I think the binaries were compiled with flags that are not supported by the emulator. All the errors are ILLEGAL:

The following tests FAILED:
	  1 - test-tokenizer-0-bert-bge (ILLEGAL)               main
Errors while running CTest
Output from these tests are in: /home/runner/work/llama.cpp/llama.cpp/build/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.
	  2 - test-tokenizer-0-command-r (ILLEGAL)              main
	  3 - test-tokenizer-0-deepseek-coder (ILLEGAL)         main
	  4 - test-tokenizer-0-deepseek-llm (ILLEGAL)           main
	  5 - test-tokenizer-0-falcon (ILLEGAL)                 main
	  6 - test-tokenizer-0-gpt-2 (ILLEGAL)                  main
	  7 - test-tokenizer-0-llama-bpe (ILLEGAL)              main
	  8 - test-tokenizer-0-llama-spm (ILLEGAL)              main
	  9 - test-tokenizer-0-mpt (ILLEGAL)                    main
	 10 - test-tokenizer-0-phi-3 (ILLEGAL)                  main
	 11 - test-tokenizer-0-qwen2 (ILLEGAL)                  main
	 12 - test-tokenizer-0-refact (ILLEGAL)                 main
	 13 - test-tokenizer-0-starcoder (ILLEGAL)              main
	 21 - test-tokenizer-1-llama-spm (ILLEGAL)              main
	 27 - test-thread-safety (ILLEGAL)                      main
	 28 - test-arg-parser (ILLEGAL)                         main
	 29 - test-gguf (ILLEGAL)                               main
	 30 - test-backend-ops (ILLEGAL)                        main
	 33 - test-barrier (ILLEGAL)                            main
	 34 - test-quantize-fns (ILLEGAL)                       main
	 35 - test-quantize-perf (ILLEGAL)                      main
	 36 - test-rope (ILLEGAL)                               main

I guess it’s just random hardware selection. I can dig into that later.

@ggerganov
Copy link
Member

I restarted the workflows. If CI is green, I think we are good to merge, correct?

@angt
Copy link
Collaborator Author

angt commented Sep 25, 2025

I restarted the workflows. If CI is green, I think we are good to merge, correct?

Yes! I can refactor (downloader.cpp) and remove json just after

@angt
Copy link
Collaborator Author

angt commented Sep 25, 2025

@ggerganov
Copy link
Member

I think i can fix https://github.com/ggml-org/llama.cpp/actions/runs/18009748371/job/51252952728?pr=16185

Yup, this should be fixed before merging.

@slaren
Copy link
Member

slaren commented Sep 25, 2025

I cleared the ccache cache of the sanitizer test before you re-ran the CI, I suspect that was the cause.

Comment on lines 219 to 221
-DCMAKE_SYSTEM_NAME=Linux \
-DGGML_CCACHE=OFF \
-DGGML_NATIVE=OFF \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm don't think I understand how this change fixed the CI for ubuntu-cpu-make?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was a bit extreme on this one and tried everything I could think of to make it work.

The tricky part is CMAKE_SYSTEM_NAME=Linux, which makes CMake believe we are cross-compiling (setting CMAKE_CROSSCOMPILING) but without breaking everything else. With that flag, i can disable GGML_NATIVE_DEFAULT:

if (CMAKE_CROSSCOMPILING OR DEFINED ENV{SOURCE_DATE_EPOCH})
message(STATUS "Setting GGML_NATIVE_DEFAULT to OFF")
set(GGML_NATIVE_DEFAULT OFF)
else()
set(GGML_NATIVE_DEFAULT ON)
endif()

then disabling GGML_NATIVE allows us not to set INS_ENB:

if (GGML_NATIVE OR NOT GGML_NATIVE_DEFAULT)
set(INS_ENB OFF)
else()
set(INS_ENB ON)
endif()

This way we get the lowest CPU mode. But honestly, -DGGML_NATIVE=OFF should be enough, and I also disabled ccache to increase my chances :)

@angt
Copy link
Collaborator Author

angt commented Sep 26, 2025

@ggerganov I believe we’re good now, right?

@ggerganov
Copy link
Member

The ubuntu-cpu-make workflow used to be successful without GGML_NATIVE=OFF, so not sure this last commit is needed.

Is it possible that the ccache clearing also affected this workflow and we simply had to rerun it in the first place?

@angt
Copy link
Collaborator Author

angt commented Sep 26, 2025

The ubuntu-cpu-make workflow used to be successful without GGML_NATIVE=OFF, so not sure this last commit is needed.

Is it possible that the ccache clearing also affected this workflow and we simply had to rerun it in the first place?

I believe disabling GGML_NATIVE in CI/CD should be the standard approach to ensure reproducibility when that feature is not explicitly tested.
I can remove the commit, but I expect the workflow will still be randomly green/red as before.

@angt
Copy link
Collaborator Author

angt commented Sep 26, 2025

We can #16257 before so i can rebase to cleanup this PR

@angt angt force-pushed the use-cpp-httplib-as-a-curl-alternative-for-downloads branch from 1e653fa to 1c92441 Compare September 26, 2025 10:44
@angt angt requested a review from danbev as a code owner September 26, 2025 10:44
Signed-off-by: Adrien Gallouët <[email protected]>
The existing cURL implementation is intentionally left untouched to
prevent any regressions and to allow for safe, side-by-side testing by
toggling the `LLAMA_CURL` CMake option.

Signed-off-by: Adrien Gallouët <[email protected]>
Signed-off-by: Adrien Gallouët <[email protected]>
@angt angt force-pushed the use-cpp-httplib-as-a-curl-alternative-for-downloads branch from 1c92441 to 6c53aef Compare September 26, 2025 10:45
@angt
Copy link
Collaborator Author

angt commented Sep 26, 2025

@angt
Copy link
Collaborator Author

angt commented Sep 26, 2025

Do I make a dedicated PR to disable GGML_NATIVE in the CI to avoid this kind of issue?

@ggerganov
Copy link
Member

Yes, let's push this change in a separate PR.

I tried to reproduce these failures locally on my Ubuntu machine, but I can't. And I don't think this workflow has ever failed before like this. So I am still confused why it started happening now.

If it is related to ccache I wonder why it is not happening to other workflows that do not have GGML_NATIVE=OFF. Do we need to disable native builds on all workflows too? Does not seem like a good solution.

@ggerganov ggerganov merged commit b995a10 into ggml-org:master Sep 26, 2025
59 of 63 checks passed
struct pushed a commit to struct/llama.cpp that referenced this pull request Sep 26, 2025
…g#16185)

* vendor : update httplib

Signed-off-by: Adrien Gallouët <[email protected]>

* common : use cpp-httplib as a cURL alternative for downloads

The existing cURL implementation is intentionally left untouched to
prevent any regressions and to allow for safe, side-by-side testing by
toggling the `LLAMA_CURL` CMake option.

Signed-off-by: Adrien Gallouët <[email protected]>

* ggml : Bump to Windows 10

Signed-off-by: Adrien Gallouët <[email protected]>

---------

Signed-off-by: Adrien Gallouët <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
devops improvements to build systems and github actions ggml changes relating to the ggml tensor library for machine learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants