-
Notifications
You must be signed in to change notification settings - Fork 12.8k
Closed
Labels
Description
Model: OpenCodeInterpreter-DS-6.7B (GGUFs)
This is a deepseek coder instruct-based model, llama arch, but maybe there's something distinct for it that requires special-handling?
Or maybe I did something wrong in converting these files from the original safetensors (used the same build, b2249, for converting, quantizing, and running).
Both -ngl=999
& -ngl=0
produce the same exception:
libc++abi: terminating due to uncaught exception of type std::out_of_range: unordered_map::at: key not found
llama.cpp build info
b2249
(rev:15499eb94227401bdc8875da6eb85c15d37068f7
)- compiled with
LLAMA_METAL=1
- macOS M1 Pro
lldb stacktrace
Process 25487 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
frame #0: 0x0000000188223330 libc++abi.dylib`__cxa_throw
libc++abi.dylib`__cxa_throw:
-> 0x188223330 <+0>: pacibsp
0x188223334 <+4>: stp x22, x21, [sp, #-0x30]!
0x188223338 <+8>: stp x20, x19, [sp, #0x10]
0x18822333c <+12>: stp x29, x30, [sp, #0x20]
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
* frame #0: 0x0000000188223330 libc++abi.dylib`__cxa_throw
frame #1: 0x00000001000684c0 main`std::__1::__throw_out_of_range[abi:v160006](char const*) + 60
frame #2: 0x000000010006a790 main`llama_byte_to_token(llama_vocab const&, unsigned char) + 472
frame #3: 0x000000010003d270 main`llama_model_load(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, llama_model&, llama_model_params&) + 1968
frame #4: 0x000000010003ca08 main`llama_load_model_from_file + 420
frame #5: 0x00000001000a208c main`llama_init_from_gpt_params(gpt_params&) + 96
frame #6: 0x00000001000ed73c main`main + 2404
frame #7: 0x0000000187ee90e0 dyld`start + 2360
full lldb output from `./main`:
(lldb) target create "./main"
Current executable set to '/Users/tito/code/llama.cpp/main' (arm64).
(lldb) settings set -- target.run-args "-m" "/Users/tito/code/autogguf/OpenCodeInterpreter-DS-6.7B/opencodeinterpreter-ds-6.7b.Q4_K_M.gguf" "-t" "7" "--color" "--ctx_size" "4096" "--keep" "4" "--in-prefix" "<|User|>\\n" "--in-suffix" "\\n<|Assistant|>\\n" "-r" "<|User|>" "-r" "<|Assistant|>" "-r" "<|EOT|>" "-ins" "-b" "512" "-n" "-1" "--temp" "0.7" "--repeat_penalty" "1.1" "-ngl" "0"
(lldb) breakpoint set -E C++
Breakpoint 1: no locations (pending).
(lldb) run
Process 25487 launched: '/Users/tito/code/llama.cpp/main' (arm64)
2 locations added to breakpoint 1
Log start
main: build = 2249 (15499eb9)
main: built with Apple clang version 15.0.0 (clang-1500.1.0.2.5) for arm64-apple-darwin23.3.0
main: seed = 1708707124
llama_model_loader: loaded meta data with 23 key-value pairs and 291 tensors from /Users/tito/code/autogguf/OpenCodeInterpreter-DS-6.7B/opencodeinterpreter-ds-6.7b.Q4_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.name str = .
llama_model_loader: - kv 2: llama.context_length u32 = 16384
llama_model_loader: - kv 3: llama.embedding_length u32 = 4096
llama_model_loader: - kv 4: llama.block_count u32 = 32
llama_model_loader: - kv 5: llama.feed_forward_length u32 = 11008
llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 7: llama.attention.head_count u32 = 32
llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 32
llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000001
llama_model_loader: - kv 10: llama.rope.freq_base f32 = 100000.000000
llama_model_loader: - kv 11: llama.rope.scaling.type str = linear
llama_model_loader: - kv 12: llama.rope.scaling.factor f32 = 4.000000
llama_model_loader: - kv 13: general.file_type u32 = 15
llama_model_loader: - kv 14: tokenizer.ggml.model str = llama
llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,32256] = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 16: tokenizer.ggml.scores arr[f32,32256] = [-1000.000000, -1000.000000, -1000.00...
llama_model_loader: - kv 17: tokenizer.ggml.token_type arr[i32,32256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 18: tokenizer.ggml.bos_token_id u32 = 32013
llama_model_loader: - kv 19: tokenizer.ggml.eos_token_id u32 = 32021
llama_model_loader: - kv 20: tokenizer.ggml.padding_token_id u32 = 32014
llama_model_loader: - kv 21: tokenizer.chat_template str = {%- set found_item = false -%}\n{%- fo...
llama_model_loader: - kv 22: general.quantization_version u32 = 2
llama_model_loader: - type f32: 65 tensors
llama_model_loader: - type q4_K: 193 tensors
llama_model_loader: - type q6_K: 33 tensors
Process 25487 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
frame #0: 0x0000000188223330 libc++abi.dylib`__cxa_throw
libc++abi.dylib`__cxa_throw:
-> 0x188223330 <+0>: pacibsp
0x188223334 <+4>: stp x22, x21, [sp, #-0x30]!
0x188223338 <+8>: stp x20, x19, [sp, #0x10]
0x18822333c <+12>: stp x29, x30, [sp, #0x20]
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
* frame #0: 0x0000000188223330 libc++abi.dylib`__cxa_throw
frame #1: 0x00000001000684c0 main`std::__1::__throw_out_of_range[abi:v160006](char const*) + 60
frame #2: 0x000000010006a790 main`llama_byte_to_token(llama_vocab const&, unsigned char) + 472
frame #3: 0x000000010003d270 main`llama_model_load(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, llama_model&, llama_model_params&) + 1968
frame #4: 0x000000010003ca08 main`llama_load_model_from_file + 420
frame #5: 0x00000001000a208c main`llama_init_from_gpt_params(gpt_params&) + 96
frame #6: 0x00000001000ed73c main`main + 2404
frame #7: 0x0000000187ee90e0 dyld`start + 2360
conversion info
$ python3.11 ./convert.py OpenCodeInterpreter-DS-6.7B \
--outtype f16 \
--outfile opencodeinterpreter-ds-6.7b.fp16.gguf \
--vocab-type hfft \
--pad-vocab
ensan-hcl