Closed
Description
Output mismatch with temperature 0 between regular run and with prompt cache option enabled.
➜ llama.cpp git:(master) ✗ cmake -B build -DCMAKE_BUILD_TYPE=RelWithDebInfo -G Ninja && cmake --build build && ./build/bin/main -m ~/models/llama-2-7b.ggmlv3.q4_0.bin -e -p 'Who are you' --temp 0
--repeat_last_n 0 --prompt-cache prompt-cache --seed 282
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
-- Configuring done (0.0s)
-- Generating done (0.0s)
-- Build files have been written to: /p/i/llama.cpp/build
ninja: no work to do.
main: build = 937 (86aeb27)
main: seed = 282
llama.cpp: loading model from /home/i/models/llama-2-7b.ggmlv3.q4_0.bin
llama_model_load_internal: format = ggjt v3 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_head_kv = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: n_gqa = 1
llama_model_load_internal: rnorm_eps = 5.0e-06
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: freq_base = 10000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype = 2 (mostly Q4_0)
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 0.08 MB
llama_model_load_internal: mem required = 3615.73 MB (+ 256.00 MB per state)
llama_new_context_with_model: kv self size = 256.00 MB
llama_new_context_with_model: compute buffer total size = 71.84 MB
system_info: n_threads = 8 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
main: attempting to load saved session from 'prompt-cache'
main: session file does not exist, will create
sampling: repeat_last_n = 0, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.000000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 0
Who are you? What is your name?
Unterscheidung der Buchstaben.
### 1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1
➜ llama.cpp git:(master) ✗ cmake -B build -DCMAKE_BUILD_TYPE=RelWithDebInfo -G Ninja && cmake --build build && ./build/bin/main -m ~/models/llama-2-7b.ggmlv3.q4_0.bin -e -p 'Who are you' --temp 0 --repeat_last_n 0 --prompt-cache prompt-cache --seed 282
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
-- Configuring done (0.0s)
-- Generating done (0.0s)
-- Build files have been written to: /p/i/llama.cpp/build
ninja: no work to do.
main: build = 937 (86aeb27)
main: seed = 282
llama.cpp: loading model from /home/i/models/llama-2-7b.ggmlv3.q4_0.bin
llama_model_load_internal: format = ggjt v3 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_head_kv = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: n_gqa = 1
llama_model_load_internal: rnorm_eps = 5.0e-06
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: freq_base = 10000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype = 2 (mostly Q4_0)
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 0.08 MB
llama_model_load_internal: mem required = 3615.73 MB (+ 256.00 MB per state)
llama_new_context_with_model: kv self size = 256.00 MB
llama_new_context_with_model: compute buffer total size = 71.84 MB
system_info: n_threads = 8 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
main: attempting to load saved session from 'prompt-cache'
main: loaded a session with prompt size of 4 tokens
main: session file has exact match for prompt!
sampling: repeat_last_n = 0, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.000000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 0
Who are you hopefully?
nobody knows.
Who are you?
nobody knows.
Who are you?
nobody knows.
Who are you?
Metadata
Metadata
Assignees
Labels
Type
Projects
Milestone
Relationships
Development
No branches or pull requests
Activity
ivanstepanovftw commentedon Aug 1, 2023
Same issue as in #1257
Again, hotfix is the same:
ivanstepanovftw commentedon Aug 1, 2023
This issue caused by #1824
github-actions commentedon Apr 9, 2024
This issue was closed because it has been inactive for 14 days since being marked as stale.