main : add detailed trace to a log file #2694

ggerganov · 2023-08-21T18:21:14Z

We desperately need detailed trace of what is being passed to the model and what has been generated in the main example.
Since we cannot print such verbose information along with the generated text, it has to be dumped to a separate log file.

It should contain info such as:

input tokens
top K tokens
sampled token
decisions for adding prefix / suffix / etc.
decisions for ignoring EOS
decisions for inserting reverse prompt
etc.

Basically everything that happens in the state machine of main has to be traced so we can understand better what is going on

See: #2689 (comment)

The text was updated successfully, but these errors were encountered:

KerfuffleV2 · 2023-08-22T00:19:38Z

Without getting too fancy how about something that just adds information for a log file and log level to llama_context and:

A function to open the log + set log level.
A function to close it.
a function that takes llama_context, a log level and printf style arguments and outputs a line to the log file if it's actually available and the log level >= the level in the context

With that base functionality, it would be pretty easy to just add log lines in various places and it can be done incrementally.

ggerganov · 2023-08-22T07:58:00Z

Hm, do we need this logging functionality added to llama.cpp?
I was thinking something in common and used only in the examples like main. Maybe start like this and later we can consider moving it to llama.cpp if we see benefits

KerfuffleV2 · 2023-08-22T11:52:13Z

do we need this logging functionality added to llama.cpp?

Well, you don't have to but then there's the issue of where the info like open file for the logging stuff lives. If it's a separate structure, then everything would have to be modified to pass it around. The other alternative is to use global variables, which I'm not really a fan of and could cause unexpected issues with threading (or would need to be complicated enough to deal with multithreaded use).

The advantage of having it in the llama_context is that's generally going to be available.

ggerganov · 2023-08-22T12:44:32Z

If it's a separate structure, then everything would have to be modified to pass it around.

I am thinking about something much simpler - open a log file at the start of main and log things in the main loop:

https://github.com/ggerganov/llama.cpp/blob/master/examples/main/main.cpp#L405-L799

For example:

            // infinite text generation via context swapping
            // if we run out of context:
            // - take the n_keep first tokens from the original prompt (via n_past)
            // - take half of the last (n_ctx - n_keep) tokens and recompute the logits in batches
            if (n_past + (int) embd.size() + std::max<int>(0, guidance_offset) > n_ctx) {
                if (params.n_predict == -2) {
                    fprintf(stderr, "\n\n%s: context full, stopping generation\n", __func__);
                    break;
                }


                const int n_left = n_past - params.n_keep;
                // always keep the first token - BOS
                n_past = std::max(1, params.n_keep);
                n_past_guidance = std::max(1, params.n_keep + guidance_offset);


                // insert n_left/2 tokens at the start of embd from last_n_tokens
                embd.insert(embd.begin(), last_n_tokens.begin() + n_ctx - n_left/2 - embd.size(), last_n_tokens.end() - embd.size());


                // stop saving session if we run out of context
                path_session.clear();


                //printf("\n---\n");
                //printf("resetting: '");
                //for (int i = 0; i < (int) embd.size(); i++) {
                //    printf("%s", llama_token_to_str(ctx, embd[i]));
                //}
                //printf("'\n");
                //printf("\n---\n");
            }

Becomes:

            // infinite text generation via context swapping
            // if we run out of context:
            // - take the n_keep first tokens from the original prompt (via n_past)
            // - take half of the last (n_ctx - n_keep) tokens and recompute the logits in batches
            if (n_past + (int) embd.size() + std::max<int>(0, guidance_offset) > n_ctx) {
                if (params.n_predict == -2) {
                    fprintf(stderr, "\n\n%s: context full, stopping generation\n", __func__);
                    break;
                }

                LOG("context full - doing context swap, n_past = %d, n_keep = %d\n", n_past, n_keep);

                const int n_left = n_past - params.n_keep;
                // always keep the first token - BOS
                n_past = std::max(1, params.n_keep);
                n_past_guidance = std::max(1, params.n_keep + guidance_offset);

                LOG("  - new values: n_left = %d, n_past = %d, n_past_guidance = %d\n", n_left, n_past, n_past_guidance);

                // insert n_left/2 tokens at the start of embd from last_n_tokens
                embd.insert(embd.begin(), last_n_tokens.begin() + n_ctx - n_left/2 - embd.size(), last_n_tokens.end() - embd.size());

                LOG_TOKENS("  - new context:", embd);

                // stop saving session if we run out of context
                path_session.clear();

                LOG("  - resetting session path\n");
            }

Basically trace everything in the main loop. No need to pass objects around.
The LOG_ helper macros can either be defined in main.cpp or in common.

staviq · 2023-08-23T00:32:26Z

@ggerganov

Something like this ?

https://github.com/staviq/llama.cpp/blob/betterlogs/common/log.h

DannyDaemonic · 2023-08-23T03:28:43Z

@staviq Edit: Sorry! Not a third party library!

With that approach, either everyone who downloads a release will always generate a log file, always flood stderr with way too much stuff, or won't log anything at all. The only people with control will be those who compile it themselves. I wonder if that's the intention or if we're looking for something that you can enable via command line switch.

staviq · 2023-08-23T11:23:13Z

With that approach, either everyone who downloads a release will always generate a log file

Nope, I showed how a log target can be a function, which is what I was planning to implement next, together with basic log levels.

The idea was, logging starts exactly how ggerganov described it, but can be extended at any point.

Edit: I wanted to make it configurable with #define so the header can contain several different ways of using logging, which could be turned on or off with make flags, and later, unused or unnecessary parts could be removed entirely, without the need to change code where it's used.

I'm basically cleaning up and rewriting code I've been using for many years in my own projects, and I just need to make it simpler and nicer.

This is basically just a proof of concept which I posted to verify if that's at least close to what ggerganov had in mind.

ggerganov · 2023-08-30T06:30:00Z

Completed via #2748

jaslatendresse · 2024-02-01T22:37:52Z

I went through the PR that implements this but I still don't understand how to make it happen. I searched the documentation and examples for main.cpp but couldn't find anything. Is this an option you can set when calling main from terminal or something?

staviq · 2024-02-02T16:18:57Z

@jaslatendresse

It would help if you could specify what behavior are you expecting

Logs are "just working" unless you compiled with logs disabled

Log options (arguments passed via command line) are specified here: https://github.com/ggerganov/llama.cpp/blob/191221178f51b6e81122c5bda0fd79620e547d07/common/log.h#L555

jaslatendresse · 2024-02-02T17:05:46Z

@jaslatendresse

It would help if you could specify what behavior are you expecting

Logs are "just working" unless you compiled with logs disabled

Log options (arguments passed via command line) are specified here:

https://github.com/ggerganov/llama.cpp/blob/191221178f51b6e81122c5bda0fd79620e547d07/common/log.h#L555

Thanks! I found what I was trying to do. I wanted to log all the params shown from main to a log file from a subprocess, without logging the model's output. This is my solution:

subprocess.run(["./llama2.sh", params], stdout=f_out, stderr=f_log, text=True)

staviq · 2024-02-02T18:17:26Z

I see, yes, this particular logging system mentioned here, is pretty much internal, and bindings for other languages don't expose it.

ggerganov added enhancement New feature or request help wanted Extra attention is needed good first issue Good for newcomers high priority Very important issue labels Aug 21, 2023

ggerganov mentioned this issue Aug 21, 2023

main : restore old EOS behavior in interactive mode #2689

Closed

staviq mentioned this issue Aug 22, 2023

./main GGUF CUBLAS allocating GPU memory but not using it #2716

Closed

4 tasks

staviq mentioned this issue Aug 23, 2023

main: log file #2748

Merged

ggerganov assigned staviq Aug 30, 2023

ggerganov closed this as completed Aug 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

main : add detailed trace to a log file #2694

main : add detailed trace to a log file #2694

ggerganov commented Aug 21, 2023

KerfuffleV2 commented Aug 22, 2023

ggerganov commented Aug 22, 2023

KerfuffleV2 commented Aug 22, 2023

ggerganov commented Aug 22, 2023

staviq commented Aug 23, 2023

DannyDaemonic commented Aug 23, 2023 •

edited

Loading

staviq commented Aug 23, 2023 •

edited

Loading

ggerganov commented Aug 30, 2023

jaslatendresse commented Feb 1, 2024

staviq commented Feb 2, 2024

jaslatendresse commented Feb 2, 2024

staviq commented Feb 2, 2024

main : add detailed trace to a log file #2694

main : add detailed trace to a log file #2694

Comments

ggerganov commented Aug 21, 2023

KerfuffleV2 commented Aug 22, 2023

ggerganov commented Aug 22, 2023

KerfuffleV2 commented Aug 22, 2023

ggerganov commented Aug 22, 2023

staviq commented Aug 23, 2023

DannyDaemonic commented Aug 23, 2023 • edited Loading

staviq commented Aug 23, 2023 • edited Loading

ggerganov commented Aug 30, 2023

jaslatendresse commented Feb 1, 2024

staviq commented Feb 2, 2024

jaslatendresse commented Feb 2, 2024

staviq commented Feb 2, 2024

DannyDaemonic commented Aug 23, 2023 •

edited

Loading

staviq commented Aug 23, 2023 •

edited

Loading