-
Notifications
You must be signed in to change notification settings - Fork 11.8k
main : add detailed trace to a log file #2694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Without getting too fancy how about something that just adds information for a log file and log level to
With that base functionality, it would be pretty easy to just add log lines in various places and it can be done incrementally. |
Hm, do we need this logging functionality added to |
Well, you don't have to but then there's the issue of where the info like open file for the logging stuff lives. If it's a separate structure, then everything would have to be modified to pass it around. The other alternative is to use global variables, which I'm not really a fan of and could cause unexpected issues with threading (or would need to be complicated enough to deal with multithreaded use). The advantage of having it in the |
I am thinking about something much simpler - open a log file at the start of main and log things in the main loop: https://github.com/ggerganov/llama.cpp/blob/master/examples/main/main.cpp#L405-L799 For example: // infinite text generation via context swapping
// if we run out of context:
// - take the n_keep first tokens from the original prompt (via n_past)
// - take half of the last (n_ctx - n_keep) tokens and recompute the logits in batches
if (n_past + (int) embd.size() + std::max<int>(0, guidance_offset) > n_ctx) {
if (params.n_predict == -2) {
fprintf(stderr, "\n\n%s: context full, stopping generation\n", __func__);
break;
}
const int n_left = n_past - params.n_keep;
// always keep the first token - BOS
n_past = std::max(1, params.n_keep);
n_past_guidance = std::max(1, params.n_keep + guidance_offset);
// insert n_left/2 tokens at the start of embd from last_n_tokens
embd.insert(embd.begin(), last_n_tokens.begin() + n_ctx - n_left/2 - embd.size(), last_n_tokens.end() - embd.size());
// stop saving session if we run out of context
path_session.clear();
//printf("\n---\n");
//printf("resetting: '");
//for (int i = 0; i < (int) embd.size(); i++) {
// printf("%s", llama_token_to_str(ctx, embd[i]));
//}
//printf("'\n");
//printf("\n---\n");
}
Becomes: // infinite text generation via context swapping
// if we run out of context:
// - take the n_keep first tokens from the original prompt (via n_past)
// - take half of the last (n_ctx - n_keep) tokens and recompute the logits in batches
if (n_past + (int) embd.size() + std::max<int>(0, guidance_offset) > n_ctx) {
if (params.n_predict == -2) {
fprintf(stderr, "\n\n%s: context full, stopping generation\n", __func__);
break;
}
LOG("context full - doing context swap, n_past = %d, n_keep = %d\n", n_past, n_keep);
const int n_left = n_past - params.n_keep;
// always keep the first token - BOS
n_past = std::max(1, params.n_keep);
n_past_guidance = std::max(1, params.n_keep + guidance_offset);
LOG(" - new values: n_left = %d, n_past = %d, n_past_guidance = %d\n", n_left, n_past, n_past_guidance);
// insert n_left/2 tokens at the start of embd from last_n_tokens
embd.insert(embd.begin(), last_n_tokens.begin() + n_ctx - n_left/2 - embd.size(), last_n_tokens.end() - embd.size());
LOG_TOKENS(" - new context:", embd);
// stop saving session if we run out of context
path_session.clear();
LOG(" - resetting session path\n");
} Basically trace everything in the main loop. No need to pass objects around. |
Something like this ? https://github.com/staviq/llama.cpp/blob/betterlogs/common/log.h |
@staviq Edit: Sorry! Not a third party library! With that approach, either everyone who downloads a release will always generate a log file, always flood stderr with way too much stuff, or won't log anything at all. The only people with control will be those who compile it themselves. I wonder if that's the intention or if we're looking for something that you can enable via command line switch. |
Nope, I showed how a log target can be a function, which is what I was planning to implement next, together with basic log levels. The idea was, logging starts exactly how ggerganov described it, but can be extended at any point. Edit: I wanted to make it configurable with #define so the header can contain several different ways of using logging, which could be turned on or off with make flags, and later, unused or unnecessary parts could be removed entirely, without the need to change code where it's used. I'm basically cleaning up and rewriting code I've been using for many years in my own projects, and I just need to make it simpler and nicer. This is basically just a proof of concept which I posted to verify if that's at least close to what ggerganov had in mind. |
Completed via #2748 |
I went through the PR that implements this but I still don't understand how to make it happen. I searched the documentation and examples for main.cpp but couldn't find anything. Is this an option you can set when calling main from terminal or something? |
It would help if you could specify what behavior are you expecting Logs are "just working" unless you compiled with logs disabled Log options (arguments passed via command line) are specified here: https://github.com/ggerganov/llama.cpp/blob/191221178f51b6e81122c5bda0fd79620e547d07/common/log.h#L555 |
Thanks! I found what I was trying to do. I wanted to log all the params shown from main to a log file from a subprocess, without logging the model's output. This is my solution:
|
I see, yes, this particular logging system mentioned here, is pretty much internal, and bindings for other languages don't expose it. |
We desperately need detailed trace of what is being passed to the model and what has been generated in the
main
example.Since we cannot print such verbose information along with the generated text, it has to be dumped to a separate log file.
It should contain info such as:
Basically everything that happens in the state machine of
main
has to be traced so we can understand better what is going onSee: #2689 (comment)
The text was updated successfully, but these errors were encountered: