Implement progress bar and multi-connection downloads #16115

ericcurtin · 2025-09-19T16:41:16Z

For llama-server pulling

ericcurtin · 2025-09-19T16:49:10Z

@npopov-vst @ngxson PTAL

common/arg.cpp

For llama-server pulling Signed-off-by: Eric Curtin <[email protected]>

ngxson

Very nice feature!

Btw, I think we should move all code related to downloading into a new file, download.cpp, as recently arg.cpp becomes quite big.

ngxson · 2025-09-20T10:58:37Z

common/arg.cpp

    std::string etag;
    std::string last_modified;
    std::string accept_ranges;
+    long long   content_length = -1;


Maybe better to use int64_t so it blends in with existing code style?

ngxson · 2025-09-20T11:45:14Z

common/arg.cpp

+    }
+
+  private:
+    void display_progress(long long total_downloaded) {


I think we should have a way to throttle this function. The user may redirect stdout/stderr to a file, in such case this function may cog up the log

Yeah it's called by 4 different tasks asynchronously:

https://curl.se/libcurl/c/CURLOPT_XFERINFOFUNCTION.html

I don't think curl has a way of controlling the rate the progress meter callback is called. But you can kinda fake it by keeping track of the time we last displayed and just do nothing if the callback has been called within the last 200ms (or whatever we decide what interval is appropriate).

Yes I think a delay of 500ms should work. Just a simple check current_time - last_time > 500ms should be enough.

ngxson · 2025-09-20T11:46:39Z

common/arg.cpp

+        std::string progress_bar;
+        const long  pos = (percentage * progress_bar_width) / 100;
+        for (int i = 0; i < progress_bar_width; ++i) {
+            progress_bar.append((i < pos) ? "█" : " ");


Suggest this to increase the visibility, though not very important:

Suggested change

progress_bar.append((i < pos) ? "█" : " ");

progress_bar.append((i < pos) ? "█" : "_");

For ref, docker use [ ==> ] style

Could try Docker-style, maybe then the Windows specific code (ConsoleOutputCP) can go away

ngxson · 2025-09-20T11:47:47Z

common/arg.cpp

 #include <windows.h>
 #endif

 #define JSON_ASSERT GGML_ASSERT


for visibility, this define should always be followed by #include <nlohmann/json.hpp>. it is specific to json.hpp

npopov-vst · 2025-09-20T11:53:31Z

common/arg.cpp

+                    LOG_DBG("%s: downloading chunk %zu range %s\n", __func__, chunk_idx, range_str.c_str());
+                } else {
+                    // Chunk already completed
+                    chunk.completed = true;


When I resume previously cancelled download with some chunks already completed, I get an error:

common_download_file_multiconn: combining 4 chunks into final file remove: The process cannot access the file because it is being used by another process.: "C:\Users\nick1\AppData\Local\llama.cpp\ggml-org_gemma-3-4b-it-qat-GGUF_gemma-3-4b-it-qat-Q4_0.gguf.downloadInProgress.chunk0"

Looks like there needs to be:

chunk.file.reset();

before return

ngxson · 2025-09-20T11:54:10Z

common/arg.cpp

+        auto format_size = [](long long bytes) -> std::string {
+            const char * units[]  = { "B", "KB", "MB", "GB" };
+            double       size     = bytes;
+            int          unit_idx = 0;
+            while (size >= 1024.0 && unit_idx < 3) {
+                size /= 1024.0;
+                unit_idx++;
+            }
+            return string_format("%.1f%s", size, units[unit_idx]);
+        };
+
+        // Format speed display
+        auto format_speed = [&](double bytes_per_sec) -> std::string {
+            const char * units[]  = { "B/s", "KB/s", "MB/s", "GB/s" };
+            double       speed    = bytes_per_sec;
+            int          unit_idx = 0;
+            while (speed >= 1024.0 && unit_idx < 3) {
+                speed /= 1024.0;
+                unit_idx++;
+            }
+            return string_format("%.1f%s", speed, units[unit_idx]);
+        };


maybe merge these 2 functions into one?

auto format_unit = [&](double bytes_per_sec, std::array<const char *, 4> units) -> std::string { double speed = bytes_per_sec; size_t unit_idx = 0; while (speed >= 1024.0 && unit_idx < units.size()) { speed /= 1024.0; unit_idx++; } return string_format("%.1f%s", speed, units[unit_idx]); };

npopov-vst · 2025-09-20T12:15:36Z

Thanks @ericcurtin! One more issue that I have found on Windows:

write_file: unable to rename file: C:\Users\nick1\AppData\Local\llama.cpp\ggml-org_gemma-3-4b-it-qat-GGUF_gemma-3-4b-it-qat-Q4_0.gguf.json.tmp to C:\Users\nick1\AppData\Local\llama.cpp\ggml-org_gemma-3-4b-it-qat-GGUF_gemma-3-4b-it-qat-Q4_0.gguf.json

Rename will fail if file already exists. So you need to check and delete an existing file in write_file function, before renaming, something like:

        if (std::filesystem::exists(fname)) {
            if (remove(fname.c_str()) != 0) {
                LOG_ERR("%s: unable to delete file: %s\n", __func__, fname.c_str());
            }
        }

        // Makes write atomic
        if (rename(fname_tmp.c_str(), fname.c_str()) != 0) {
            LOG_ERR("%s: unable to rename file: %s to %s\n", __func__, fname_tmp.c_str(), fname.c_str());
            // If rename fails, try to delete the temporary file
            if (remove(fname_tmp.c_str()) != 0) {
                LOG_ERR("%s: unable to delete temporary file: %s\n", __func__, fname_tmp.c_str());
            }
        }

ericcurtin · 2025-09-20T14:41:40Z

Very nice feature!

Btw, I think we should move all code related to downloading into a new file, download.cpp, as recently arg.cpp becomes quite big.

It's temping to create a llama-pull binary sometimes also if someone only wants to pull. I actually added exit(0)'s to the code at one point because I wanted to measure just total pull time without starting the inferencing server.

npopov-vst · 2025-09-20T15:52:19Z

It's temping to create a llama-pull binary sometimes also if someone only wants to pull.

@cunnie Forgive me if I am missing something, since I am new to this project (maybe similar functionality is already exist somewhere), but it would be great if you can create a binary, which can download the model from any source and with any format (hf, ollama, etc) and automatically convert it to gguf with specified quantization (using parallel and resumable download).

cunnie · 2025-09-20T16:03:37Z

cunnie

@npopov-vst : I think you meant @ericcurtin , not me.

npopov-vst · 2025-09-20T16:05:09Z

@cunnie Sorry)

ericcurtin · 2025-09-20T16:32:31Z

It's temping to create a llama-pull binary sometimes also if someone only wants to pull.

@cunnie Forgive me if I am missing something, since I am new to this project (maybe similar functionality is already exist somewhere), but it would be great if you can create a binary, which can download the model from any source and with any format (hf, ollama, etc) and automatically convert it to gguf with specified quantization (using parallel and resumable download).

There is an Ollama puller implementation in llama.cpp , but I'm not sure about it anymore, Ollama's latest models tend to be not compatible with llama.cpp.

ericcurtin · 2025-09-22T12:40:57Z

Can't push to this branch/PR anymore because I lost access to this branch @ggerganov . I could reopen a brand new PR as ericcurtin/

ggerganov · 2025-09-22T12:54:09Z

@ericcurtin Yes, please open a separate PR from a fork.

ericcurtin · 2025-09-23T09:01:51Z

Closing this PR, not sure I'll be able to delete the branch.

Addressed some review comments here:

#16196

More to do...

ericcurtin force-pushed the multicon-progress branch 2 times, most recently from 8411e06 to 46373ef Compare September 19, 2025 16:48

ericcurtin force-pushed the multicon-progress branch from 46373ef to f1c50af Compare September 19, 2025 16:49

npopov-vst reviewed Sep 19, 2025

View reviewed changes

common/arg.cpp Outdated Show resolved Hide resolved

npopov-vst reviewed Sep 19, 2025

View reviewed changes

common/arg.cpp Outdated Show resolved Hide resolved

ericcurtin force-pushed the multicon-progress branch from f1c50af to 9919c85 Compare September 19, 2025 22:45

npopov-vst reviewed Sep 20, 2025

View reviewed changes

common/arg.cpp Outdated Show resolved Hide resolved

npopov-vst reviewed Sep 20, 2025

View reviewed changes

common/arg.cpp Outdated Show resolved Hide resolved

ericcurtin force-pushed the multicon-progress branch from 9919c85 to 0d9c634 Compare September 20, 2025 10:39

Implement progress bar and multi-connection downloads

077b475

For llama-server pulling Signed-off-by: Eric Curtin <[email protected]>

ericcurtin force-pushed the multicon-progress branch from 0d9c634 to 077b475 Compare September 20, 2025 10:39

ngxson reviewed Sep 20, 2025

View reviewed changes

npopov-vst reviewed Sep 20, 2025

View reviewed changes

ngxson reviewed Sep 20, 2025

View reviewed changes

ericcurtin mentioned this pull request Sep 22, 2025

codeowners : update + cleanup #16174

Merged

ericcurtin closed this Sep 23, 2025

ggerganov deleted the multicon-progress branch September 23, 2025 09:49

	progress_bar.append((i < pos) ? "█" : " ");
	progress_bar.append((i < pos) ? "█" : "_");

Implement progress bar and multi-connection downloads #16115

Implement progress bar and multi-connection downloads #16115

Uh oh!

Conversation

ericcurtin commented Sep 19, 2025

Uh oh!

ericcurtin commented Sep 19, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ngxson left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ericcurtin Sep 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

npopov-vst commented Sep 20, 2025

Uh oh!

ericcurtin commented Sep 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

npopov-vst commented Sep 20, 2025

Uh oh!

cunnie commented Sep 20, 2025

Uh oh!

npopov-vst commented Sep 20, 2025

Uh oh!

ericcurtin commented Sep 20, 2025

Uh oh!

ericcurtin commented Sep 22, 2025

Uh oh!

ggerganov commented Sep 22, 2025

Uh oh!

ericcurtin commented Sep 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ngxson left a comment •

edited

Loading

ericcurtin Sep 20, 2025 •

edited

Loading

ericcurtin commented Sep 20, 2025 •

edited

Loading