Skip to content

Conversation

ericcurtin
Copy link
Collaborator

For llama-server pulling

@ericcurtin ericcurtin force-pushed the multicon-progress branch 2 times, most recently from 8411e06 to 46373ef Compare September 19, 2025 16:48
@ericcurtin
Copy link
Collaborator Author

@npopov-vst @ngxson PTAL

For llama-server pulling

Signed-off-by: Eric Curtin <[email protected]>
Copy link
Collaborator

@ngxson ngxson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice feature!

Btw, I think we should move all code related to downloading into a new file, download.cpp, as recently arg.cpp becomes quite big.

std::string etag;
std::string last_modified;
std::string accept_ranges;
long long content_length = -1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe better to use int64_t so it blends in with existing code style?

}

private:
void display_progress(long long total_downloaded) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have a way to throttle this function. The user may redirect stdout/stderr to a file, in such case this function may cog up the log

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's called by 4 different tasks asynchronously:

https://curl.se/libcurl/c/CURLOPT_XFERINFOFUNCTION.html

I don't think curl has a way of controlling the rate the progress meter callback is called. But you can kinda fake it by keeping track of the time we last displayed and just do nothing if the callback has been called within the last 200ms (or whatever we decide what interval is appropriate).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think a delay of 500ms should work. Just a simple check current_time - last_time > 500ms should be enough.

std::string progress_bar;
const long pos = (percentage * progress_bar_width) / 100;
for (int i = 0; i < progress_bar_width; ++i) {
progress_bar.append((i < pos) ? "" : " ");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest this to increase the visibility, though not very important:

Suggested change
progress_bar.append((i < pos) ? "" : " ");
progress_bar.append((i < pos) ? "" : "_");

For ref, docker use [ ==> ] style

Copy link
Collaborator Author

@ericcurtin ericcurtin Sep 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could try Docker-style, maybe then the Windows specific code (ConsoleOutputCP) can go away

#include <windows.h>
#endif

#define JSON_ASSERT GGML_ASSERT
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for visibility, this define should always be followed by #include <nlohmann/json.hpp>. it is specific to json.hpp

LOG_DBG("%s: downloading chunk %zu range %s\n", __func__, chunk_idx, range_str.c_str());
} else {
// Chunk already completed
chunk.completed = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I resume previously cancelled download with some chunks already completed, I get an error:

common_download_file_multiconn: combining 4 chunks into final file
remove: The process cannot access the file because it is being used by another process.: "C:\Users\nick1\AppData\Local\llama.cpp\ggml-org_gemma-3-4b-it-qat-GGUF_gemma-3-4b-it-qat-Q4_0.gguf.downloadInProgress.chunk0"

Looks like there needs to be:

chunk.file.reset();

before return

Comment on lines +444 to +465
auto format_size = [](long long bytes) -> std::string {
const char * units[] = { "B", "KB", "MB", "GB" };
double size = bytes;
int unit_idx = 0;
while (size >= 1024.0 && unit_idx < 3) {
size /= 1024.0;
unit_idx++;
}
return string_format("%.1f%s", size, units[unit_idx]);
};

// Format speed display
auto format_speed = [&](double bytes_per_sec) -> std::string {
const char * units[] = { "B/s", "KB/s", "MB/s", "GB/s" };
double speed = bytes_per_sec;
int unit_idx = 0;
while (speed >= 1024.0 && unit_idx < 3) {
speed /= 1024.0;
unit_idx++;
}
return string_format("%.1f%s", speed, units[unit_idx]);
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe merge these 2 functions into one?

auto format_unit = [&](double bytes_per_sec, std::array<const char *, 4> units) -> std::string {
    double       speed    = bytes_per_sec;
    size_t       unit_idx = 0;
    while (speed >= 1024.0 && unit_idx < units.size()) {
        speed /= 1024.0;
        unit_idx++;
    }
    return string_format("%.1f%s", speed, units[unit_idx]);
};

@npopov-vst
Copy link
Contributor

Thanks @ericcurtin! One more issue that I have found on Windows:

write_file: unable to rename file: C:\Users\nick1\AppData\Local\llama.cpp\ggml-org_gemma-3-4b-it-qat-GGUF_gemma-3-4b-it-qat-Q4_0.gguf.json.tmp to C:\Users\nick1\AppData\Local\llama.cpp\ggml-org_gemma-3-4b-it-qat-GGUF_gemma-3-4b-it-qat-Q4_0.gguf.json

Rename will fail if file already exists. So you need to check and delete an existing file in write_file function, before renaming, something like:

        if (std::filesystem::exists(fname)) {
            if (remove(fname.c_str()) != 0) {
                LOG_ERR("%s: unable to delete file: %s\n", __func__, fname.c_str());
            }
        }

        // Makes write atomic
        if (rename(fname_tmp.c_str(), fname.c_str()) != 0) {
            LOG_ERR("%s: unable to rename file: %s to %s\n", __func__, fname_tmp.c_str(), fname.c_str());
            // If rename fails, try to delete the temporary file
            if (remove(fname_tmp.c_str()) != 0) {
                LOG_ERR("%s: unable to delete temporary file: %s\n", __func__, fname_tmp.c_str());
            }
        }

@ericcurtin
Copy link
Collaborator Author

ericcurtin commented Sep 20, 2025

Very nice feature!

Btw, I think we should move all code related to downloading into a new file, download.cpp, as recently arg.cpp becomes quite big.

It's temping to create a llama-pull binary sometimes also if someone only wants to pull. I actually added exit(0)'s to the code at one point because I wanted to measure just total pull time without starting the inferencing server.

@npopov-vst
Copy link
Contributor

It's temping to create a llama-pull binary sometimes also if someone only wants to pull.

@cunnie Forgive me if I am missing something, since I am new to this project (maybe similar functionality is already exist somewhere), but it would be great if you can create a binary, which can download the model from any source and with any format (hf, ollama, etc) and automatically convert it to gguf with specified quantization (using parallel and resumable download).

@cunnie
Copy link
Contributor

cunnie commented Sep 20, 2025

cunnie

@npopov-vst : I think you meant @ericcurtin , not me.

@npopov-vst
Copy link
Contributor

@cunnie Sorry)

@ericcurtin
Copy link
Collaborator Author

It's temping to create a llama-pull binary sometimes also if someone only wants to pull.

@cunnie Forgive me if I am missing something, since I am new to this project (maybe similar functionality is already exist somewhere), but it would be great if you can create a binary, which can download the model from any source and with any format (hf, ollama, etc) and automatically convert it to gguf with specified quantization (using parallel and resumable download).

There is an Ollama puller implementation in llama.cpp , but I'm not sure about it anymore, Ollama's latest models tend to be not compatible with llama.cpp.

@ericcurtin
Copy link
Collaborator Author

Can't push to this branch/PR anymore because I lost access to this branch @ggerganov . I could reopen a brand new PR as ericcurtin/

@ggerganov
Copy link
Member

@ericcurtin Yes, please open a separate PR from a fork.

@ericcurtin
Copy link
Collaborator Author

Closing this PR, not sure I'll be able to delete the branch.

Addressed some review comments here:

#16196

More to do...

@ericcurtin ericcurtin closed this Sep 23, 2025
@ggerganov ggerganov deleted the multicon-progress branch September 23, 2025 09:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants