Skip to content

main : add detailed trace to a log file #2694

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ggerganov opened this issue Aug 21, 2023 · 12 comments
Closed

main : add detailed trace to a log file #2694

ggerganov opened this issue Aug 21, 2023 · 12 comments
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed high priority Very important issue

Comments

@ggerganov
Copy link
Member

We desperately need detailed trace of what is being passed to the model and what has been generated in the main example.
Since we cannot print such verbose information along with the generated text, it has to be dumped to a separate log file.

It should contain info such as:

  • input tokens
  • top K tokens
  • sampled token
  • decisions for adding prefix / suffix / etc.
  • decisions for ignoring EOS
  • decisions for inserting reverse prompt
  • etc.

Basically everything that happens in the state machine of main has to be traced so we can understand better what is going on

See: #2689 (comment)

@ggerganov ggerganov added enhancement New feature or request help wanted Extra attention is needed good first issue Good for newcomers high priority Very important issue labels Aug 21, 2023
@KerfuffleV2
Copy link
Collaborator

Without getting too fancy how about something that just adds information for a log file and log level to llama_context and:

  1. A function to open the log + set log level.
  2. A function to close it.
  3. a function that takes llama_context, a log level and printf style arguments and outputs a line to the log file if it's actually available and the log level >= the level in the context

With that base functionality, it would be pretty easy to just add log lines in various places and it can be done incrementally.

@ggerganov
Copy link
Member Author

Hm, do we need this logging functionality added to llama.cpp?
I was thinking something in common and used only in the examples like main. Maybe start like this and later we can consider moving it to llama.cpp if we see benefits

@KerfuffleV2
Copy link
Collaborator

do we need this logging functionality added to llama.cpp?

Well, you don't have to but then there's the issue of where the info like open file for the logging stuff lives. If it's a separate structure, then everything would have to be modified to pass it around. The other alternative is to use global variables, which I'm not really a fan of and could cause unexpected issues with threading (or would need to be complicated enough to deal with multithreaded use).

The advantage of having it in the llama_context is that's generally going to be available.

@ggerganov
Copy link
Member Author

If it's a separate structure, then everything would have to be modified to pass it around.

I am thinking about something much simpler - open a log file at the start of main and log things in the main loop:

https://github.com/ggerganov/llama.cpp/blob/master/examples/main/main.cpp#L405-L799

For example:

            // infinite text generation via context swapping
            // if we run out of context:
            // - take the n_keep first tokens from the original prompt (via n_past)
            // - take half of the last (n_ctx - n_keep) tokens and recompute the logits in batches
            if (n_past + (int) embd.size() + std::max<int>(0, guidance_offset) > n_ctx) {
                if (params.n_predict == -2) {
                    fprintf(stderr, "\n\n%s: context full, stopping generation\n", __func__);
                    break;
                }


                const int n_left = n_past - params.n_keep;
                // always keep the first token - BOS
                n_past = std::max(1, params.n_keep);
                n_past_guidance = std::max(1, params.n_keep + guidance_offset);


                // insert n_left/2 tokens at the start of embd from last_n_tokens
                embd.insert(embd.begin(), last_n_tokens.begin() + n_ctx - n_left/2 - embd.size(), last_n_tokens.end() - embd.size());


                // stop saving session if we run out of context
                path_session.clear();


                //printf("\n---\n");
                //printf("resetting: '");
                //for (int i = 0; i < (int) embd.size(); i++) {
                //    printf("%s", llama_token_to_str(ctx, embd[i]));
                //}
                //printf("'\n");
                //printf("\n---\n");
            }

Becomes:

            // infinite text generation via context swapping
            // if we run out of context:
            // - take the n_keep first tokens from the original prompt (via n_past)
            // - take half of the last (n_ctx - n_keep) tokens and recompute the logits in batches
            if (n_past + (int) embd.size() + std::max<int>(0, guidance_offset) > n_ctx) {
                if (params.n_predict == -2) {
                    fprintf(stderr, "\n\n%s: context full, stopping generation\n", __func__);
                    break;
                }

                LOG("context full - doing context swap, n_past = %d, n_keep = %d\n", n_past, n_keep);

                const int n_left = n_past - params.n_keep;
                // always keep the first token - BOS
                n_past = std::max(1, params.n_keep);
                n_past_guidance = std::max(1, params.n_keep + guidance_offset);

                LOG("  - new values: n_left = %d, n_past = %d, n_past_guidance = %d\n", n_left, n_past, n_past_guidance);

                // insert n_left/2 tokens at the start of embd from last_n_tokens
                embd.insert(embd.begin(), last_n_tokens.begin() + n_ctx - n_left/2 - embd.size(), last_n_tokens.end() - embd.size());

                LOG_TOKENS("  - new context:", embd);

                // stop saving session if we run out of context
                path_session.clear();

                LOG("  - resetting session path\n");
            }

Basically trace everything in the main loop. No need to pass objects around.
The LOG_ helper macros can either be defined in main.cpp or in common.

@staviq
Copy link
Contributor

staviq commented Aug 23, 2023

@DannyDaemonic
Copy link
Contributor

DannyDaemonic commented Aug 23, 2023

@staviq Edit: Sorry! Not a third party library!

With that approach, either everyone who downloads a release will always generate a log file, always flood stderr with way too much stuff, or won't log anything at all. The only people with control will be those who compile it themselves. I wonder if that's the intention or if we're looking for something that you can enable via command line switch.

@staviq
Copy link
Contributor

staviq commented Aug 23, 2023

With that approach, either everyone who downloads a release will always generate a log file

Nope, I showed how a log target can be a function, which is what I was planning to implement next, together with basic log levels.

The idea was, logging starts exactly how ggerganov described it, but can be extended at any point.

Edit: I wanted to make it configurable with #define so the header can contain several different ways of using logging, which could be turned on or off with make flags, and later, unused or unnecessary parts could be removed entirely, without the need to change code where it's used.

I'm basically cleaning up and rewriting code I've been using for many years in my own projects, and I just need to make it simpler and nicer.

This is basically just a proof of concept which I posted to verify if that's at least close to what ggerganov had in mind.

@ggerganov
Copy link
Member Author

Completed via #2748

@jaslatendresse
Copy link

I went through the PR that implements this but I still don't understand how to make it happen. I searched the documentation and examples for main.cpp but couldn't find anything. Is this an option you can set when calling main from terminal or something?

@staviq
Copy link
Contributor

staviq commented Feb 2, 2024

@jaslatendresse

It would help if you could specify what behavior are you expecting

Logs are "just working" unless you compiled with logs disabled

Log options (arguments passed via command line) are specified here: https://github.com/ggerganov/llama.cpp/blob/191221178f51b6e81122c5bda0fd79620e547d07/common/log.h#L555

@jaslatendresse
Copy link

@jaslatendresse

It would help if you could specify what behavior are you expecting

Logs are "just working" unless you compiled with logs disabled

Log options (arguments passed via command line) are specified here:

https://github.com/ggerganov/llama.cpp/blob/191221178f51b6e81122c5bda0fd79620e547d07/common/log.h#L555

Thanks! I found what I was trying to do. I wanted to log all the params shown from main to a log file from a subprocess, without logging the model's output. This is my solution:

subprocess.run(["./llama2.sh", params], stdout=f_out, stderr=f_log, text=True)

@staviq
Copy link
Contributor

staviq commented Feb 2, 2024

I see, yes, this particular logging system mentioned here, is pretty much internal, and bindings for other languages don't expose it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed high priority Very important issue
Projects
None yet
Development

No branches or pull requests

5 participants