-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Implement progress bar and multi-connection downloads #16115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
8411e06
to
46373ef
Compare
@npopov-vst @ngxson PTAL |
46373ef
to
f1c50af
Compare
f1c50af
to
9919c85
Compare
9919c85
to
0d9c634
Compare
For llama-server pulling Signed-off-by: Eric Curtin <[email protected]>
0d9c634
to
077b475
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice feature!
Btw, I think we should move all code related to downloading into a new file, download.cpp
, as recently arg.cpp
becomes quite big.
std::string etag; | ||
std::string last_modified; | ||
std::string accept_ranges; | ||
long long content_length = -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe better to use int64_t
so it blends in with existing code style?
} | ||
|
||
private: | ||
void display_progress(long long total_downloaded) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should have a way to throttle this function. The user may redirect stdout/stderr to a file, in such case this function may cog up the log
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it's called by 4 different tasks asynchronously:
https://curl.se/libcurl/c/CURLOPT_XFERINFOFUNCTION.html
I don't think curl has a way of controlling the rate the progress meter callback is called. But you can kinda fake it by keeping track of the time we last displayed and just do nothing if the callback has been called within the last 200ms (or whatever we decide what interval is appropriate).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think a delay of 500ms should work. Just a simple check current_time - last_time > 500ms
should be enough.
std::string progress_bar; | ||
const long pos = (percentage * progress_bar_width) / 100; | ||
for (int i = 0; i < progress_bar_width; ++i) { | ||
progress_bar.append((i < pos) ? "█" : " "); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest this to increase the visibility, though not very important:
progress_bar.append((i < pos) ? "█" : " "); | |
progress_bar.append((i < pos) ? "█" : "_"); |
For ref, docker use [ ==> ]
style
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could try Docker-style, maybe then the Windows specific code (ConsoleOutputCP) can go away
#include <windows.h> | ||
#endif | ||
|
||
#define JSON_ASSERT GGML_ASSERT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for visibility, this define should always be followed by #include <nlohmann/json.hpp>
. it is specific to json.hpp
LOG_DBG("%s: downloading chunk %zu range %s\n", __func__, chunk_idx, range_str.c_str()); | ||
} else { | ||
// Chunk already completed | ||
chunk.completed = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I resume previously cancelled download with some chunks already completed, I get an error:
common_download_file_multiconn: combining 4 chunks into final file
remove: The process cannot access the file because it is being used by another process.: "C:\Users\nick1\AppData\Local\llama.cpp\ggml-org_gemma-3-4b-it-qat-GGUF_gemma-3-4b-it-qat-Q4_0.gguf.downloadInProgress.chunk0"
Looks like there needs to be:
chunk.file.reset();
before return
auto format_size = [](long long bytes) -> std::string { | ||
const char * units[] = { "B", "KB", "MB", "GB" }; | ||
double size = bytes; | ||
int unit_idx = 0; | ||
while (size >= 1024.0 && unit_idx < 3) { | ||
size /= 1024.0; | ||
unit_idx++; | ||
} | ||
return string_format("%.1f%s", size, units[unit_idx]); | ||
}; | ||
|
||
// Format speed display | ||
auto format_speed = [&](double bytes_per_sec) -> std::string { | ||
const char * units[] = { "B/s", "KB/s", "MB/s", "GB/s" }; | ||
double speed = bytes_per_sec; | ||
int unit_idx = 0; | ||
while (speed >= 1024.0 && unit_idx < 3) { | ||
speed /= 1024.0; | ||
unit_idx++; | ||
} | ||
return string_format("%.1f%s", speed, units[unit_idx]); | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe merge these 2 functions into one?
auto format_unit = [&](double bytes_per_sec, std::array<const char *, 4> units) -> std::string {
double speed = bytes_per_sec;
size_t unit_idx = 0;
while (speed >= 1024.0 && unit_idx < units.size()) {
speed /= 1024.0;
unit_idx++;
}
return string_format("%.1f%s", speed, units[unit_idx]);
};
Thanks @ericcurtin! One more issue that I have found on Windows:
Rename will fail if file already exists. So you need to check and delete an existing file in write_file function, before renaming, something like:
|
It's temping to create a llama-pull binary sometimes also if someone only wants to pull. I actually added exit(0)'s to the code at one point because I wanted to measure just total pull time without starting the inferencing server. |
@cunnie Forgive me if I am missing something, since I am new to this project (maybe similar functionality is already exist somewhere), but it would be great if you can create a binary, which can download the model from any source and with any format (hf, ollama, etc) and automatically convert it to gguf with specified quantization (using parallel and resumable download). |
@npopov-vst : I think you meant @ericcurtin , not me. |
@cunnie Sorry) |
There is an Ollama puller implementation in llama.cpp , but I'm not sure about it anymore, Ollama's latest models tend to be not compatible with llama.cpp. |
Can't push to this branch/PR anymore because I lost access to this branch @ggerganov . I could reopen a brand new PR as ericcurtin/ |
@ericcurtin Yes, please open a separate PR from a fork. |
Closing this PR, not sure I'll be able to delete the branch. Addressed some review comments here: More to do... |
For llama-server pulling