Skip to content

Prompt cache output mismatch #2479

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ivanstepanovftw opened this issue Aug 1, 2023 · 3 comments
Closed

Prompt cache output mismatch #2479

ivanstepanovftw opened this issue Aug 1, 2023 · 3 comments
Labels
bug Something isn't working stale

Comments

@ivanstepanovftw
Copy link
Collaborator

ivanstepanovftw commented Aug 1, 2023

Output mismatch with temperature 0 between regular run and with prompt cache option enabled.

➜  llama.cpp git:(master) ✗ cmake -B build -DCMAKE_BUILD_TYPE=RelWithDebInfo -G Ninja && cmake --build build && ./build/bin/main -m ~/models/llama-2-7b.ggmlv3.q4_0.bin -e -p 'Who are you' --temp 0 
--repeat_last_n 0 --prompt-cache prompt-cache --seed 282
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
-- Configuring done (0.0s)
-- Generating done (0.0s)
-- Build files have been written to: /p/i/llama.cpp/build
ninja: no work to do.
main: build = 937 (86aeb27)
main: seed  = 282
llama.cpp: loading model from /home/i/models/llama-2-7b.ggmlv3.q4_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_head_kv  = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: n_gqa      = 1
llama_model_load_internal: rnorm_eps  = 5.0e-06
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: freq_base  = 10000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =    0.08 MB
llama_model_load_internal: mem required  = 3615.73 MB (+  256.00 MB per state)
llama_new_context_with_model: kv self size  =  256.00 MB
llama_new_context_with_model: compute buffer total size =   71.84 MB

system_info: n_threads = 8 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | 
main: attempting to load saved session from 'prompt-cache'
main: session file does not exist, will create
sampling: repeat_last_n = 0, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.000000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 0


 Who are you? What is your name?
 Unterscheidung der Buchstaben.

### 1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1
➜  llama.cpp git:(master) ✗ cmake -B build -DCMAKE_BUILD_TYPE=RelWithDebInfo -G Ninja && cmake --build build && ./build/bin/main -m ~/models/llama-2-7b.ggmlv3.q4_0.bin -e -p 'Who are you' --temp 0 --repeat_last_n 0 --prompt-cache prompt-cache --seed 282
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
-- Configuring done (0.0s)
-- Generating done (0.0s)
-- Build files have been written to: /p/i/llama.cpp/build
ninja: no work to do.
main: build = 937 (86aeb27)
main: seed  = 282
llama.cpp: loading model from /home/i/models/llama-2-7b.ggmlv3.q4_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_head_kv  = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: n_gqa      = 1
llama_model_load_internal: rnorm_eps  = 5.0e-06
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: freq_base  = 10000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =    0.08 MB
llama_model_load_internal: mem required  = 3615.73 MB (+  256.00 MB per state)
llama_new_context_with_model: kv self size  =  256.00 MB
llama_new_context_with_model: compute buffer total size =   71.84 MB

system_info: n_threads = 8 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | 
main: attempting to load saved session from 'prompt-cache'
main: loaded a session with prompt size of 4 tokens
main: session file has exact match for prompt!
sampling: repeat_last_n = 0, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.000000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 0


 Who are you hopefully?
 nobody knows.
Who are you?
nobody knows.
Who are you?
nobody knows.
Who are you?
@ivanstepanovftw
Copy link
Collaborator Author

Same issue as in #1257
Again, hotfix is the same:

- llama_save_session_file(ctx, path_session.c_str(), session_tokens.data(), session_tokens.size());
+ llama_save_session_file(ctx, path_session.c_str(), session_tokens.data(), session_tokens.size() - 1);

@ivanstepanovftw
Copy link
Collaborator Author

This issue caused by #1824

@Green-Sky Green-Sky added the bug Something isn't working label Aug 1, 2023
@github-actions github-actions bot added the stale label Mar 25, 2024
Copy link
Contributor

github-actions bot commented Apr 9, 2024

This issue was closed because it has been inactive for 14 days since being marked as stale.

@github-actions github-actions bot closed this as completed Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale
Projects
None yet
Development

No branches or pull requests

2 participants