Cant finetune smollm2:360m

command: ./llama-finetune -p "bite me"   -m  ../SmolLM2-360M-Instruct-f16.gguf -o uzi
output:main: force disabling memory mapping because it would result in-read-only pointers to the weights
main: force changing k cache type to f32 due to a lack of f16 support for OUT_PROD
main: force changing v cache type to f32 due to a lack of f16 support for OUT_PROD
build: 6259 (710dfc46) with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu
llama_model_loader: loaded meta data with 33 key-value pairs and 290 tensors from ../SmolLM2-360M-Instruct-f16.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Smollm2 360M 8k Lc100K Mix1 Ep2
llama_model_loader: - kv   3:                       general.organization str              = Loubnabnl
llama_model_loader: - kv   4:                           general.finetune str              = 8k-lc100k-mix1-ep2
llama_model_loader: - kv   5:                           general.basename str              = smollm2
llama_model_loader: - kv   6:                         general.size_label str              = 360M
llama_model_loader: - kv   7:                            general.license str              = apache-2.0
llama_model_loader: - kv   8:                          general.languages arr[str,1]       = ["en"]
llama_model_loader: - kv   9:                          llama.block_count u32              = 32
llama_model_loader: - kv  10:                       llama.context_length u32              = 8192
llama_model_loader: - kv  11:                     llama.embedding_length u32              = 960
llama_model_loader: - kv  12:                  llama.feed_forward_length u32              = 2560
llama_model_loader: - kv  13:                 llama.attention.head_count u32              = 15
llama_model_loader: - kv  14:              llama.attention.head_count_kv u32              = 5
llama_model_loader: - kv  15:                       llama.rope.freq_base f32              = 100000.000000
llama_model_loader: - kv  16:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  17:                          general.file_type u32              = 1
llama_model_loader: - kv  18:                           llama.vocab_size u32              = 49152
llama_model_loader: - kv  19:                 llama.rope.dimension_count u32              = 64
llama_model_loader: - kv  20:            tokenizer.ggml.add_space_prefix bool             = false
llama_model_loader: - kv  21:               tokenizer.ggml.add_bos_token bool             = false
llama_model_loader: - kv  22:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  23:                         tokenizer.ggml.pre str              = smollm
llama_model_loader: - kv  24:                      tokenizer.ggml.tokens arr[str,49152]   = ["<|endoftext|>", "<|im_start|>", "<|...
llama_model_loader: - kv  25:                  tokenizer.ggml.token_type arr[i32,49152]   = [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ...
llama_model_loader: - kv  26:                      tokenizer.ggml.merges arr[str,48900]   = ["Ġ t", "Ġ a", "i n", "h e", "Ġ Ġ...
llama_model_loader: - kv  27:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  28:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  29:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - kv  30:            tokenizer.ggml.padding_token_id u32              = 2
llama_model_loader: - kv  31:                    tokenizer.chat_template str              = {% for message in messages %}{% if lo...
llama_model_loader: - kv  32:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type  f16:  225 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = F16
print_info: file size   = 690.24 MiB (16.00 BPW) 
load: printing all EOG tokens:
load:   - 0 ('<|endoftext|>')
load:   - 2 ('<|im_end|>')
load:   - 4 ('<reponame>')
load: special tokens cache size = 17
load: token to piece cache size = 0.3170 MB
print_info: arch             = llama
print_info: vocab_only       = 0
print_info: n_ctx_train      = 8192
print_info: n_embd           = 960
print_info: n_layer          = 32
print_info: n_head           = 15
print_info: n_head_kv        = 5
print_info: n_rot            = 64
print_info: n_swa            = 0
print_info: is_swa_any       = 0
print_info: n_embd_head_k    = 64
print_info: n_embd_head_v    = 64
print_info: n_gqa            = 3
print_info: n_embd_k_gqa     = 320
print_info: n_embd_v_gqa     = 320
print_info: f_norm_eps       = 0.0e+00
print_info: f_norm_rms_eps   = 1.0e-05
print_info: f_clamp_kqv      = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale    = 0.0e+00
print_info: f_attn_scale     = 0.0e+00
print_info: n_ff             = 2560
print_info: n_expert         = 0
print_info: n_expert_used    = 0
print_info: causal attn      = 1
print_info: pooling type     = 0
print_info: rope type        = 0
print_info: rope scaling     = linear
print_info: freq_base_train  = 100000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn  = 8192
print_info: rope_finetuned   = unknown
print_info: model type       = 3B
print_info: model params     = 361.82 M
print_info: general.name     = Smollm2 360M 8k Lc100K Mix1 Ep2
print_info: vocab type       = BPE
print_info: n_vocab          = 49152
print_info: n_merges         = 48900
print_info: BOS token        = 1 '<|im_start|>'
print_info: EOS token        = 2 '<|im_end|>'
print_info: EOT token        = 0 '<|endoftext|>'
print_info: UNK token        = 0 '<|endoftext|>'
print_info: PAD token        = 2 '<|im_end|>'
print_info: LF token         = 198 'Ċ'
print_info: FIM REP token    = 4 '<reponame>'
print_info: EOG token        = 0 '<|endoftext|>'
print_info: EOG token        = 2 '<|im_end|>'
print_info: EOG token        = 4 '<reponame>'
print_info: max token length = 162
load_tensors: loading model tensors, this can take a while... (mmap = false)
load_tensors:          CPU model buffer size =   690.24 MiB
........................................................................................
llama_context: constructing llama_context
llama_context: n_seq_max     = 1
llama_context: n_ctx         = 4096
llama_context: n_ctx_per_seq = 4096
llama_context: n_batch       = 2048
llama_context: n_ubatch      = 512
llama_context: causal_attn   = 1
llama_context: flash_attn    = 0
llama_context: kv_unified    = false
llama_context: freq_base     = 100000.0
llama_context: freq_scale    = 1
llama_context: n_ctx_per_seq (4096) < n_ctx_train (8192) -- the full capacity of the model will not be utilized
llama_context:        CPU  output buffer size =     0.19 MiB
llama_kv_cache:        CPU KV buffer size =   320.00 MiB
llama_kv_cache: size =  320.00 MiB (  4096 cells,  32 layers,  1/1 seqs), K (f32):  160.00 MiB, V (f32):  160.00 MiB
llama_context:        CPU compute buffer size =   136.76 MiB
llama_context: graph nodes  = 1126
llama_context: graph splits = 1
common_init_from_params: added <|endoftext|> logit bias = -inf
common_init_from_params: added <|im_end|> logit bias = -inf
common_init_from_params: added <reponame> logit bias = -inf
common_init_from_params: setting dry_penalty_last_n to ctx_size = 4096
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)

system_info: n_threads = 2 (n_threads_batch = 2) / 4 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | F16C = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 | 
ggml_aligned_malloc: insufficient memory (attempted to allocate 17592186044415.94 MB)
ggml_backend_cpu_buffer_type_alloc_buffer: failed to allocate buffer of size 18446744073709486080
alloc_tensor_range: failed to allocate CPU buffer of size 18446744073709486080
[New LWP 45805]
[New LWP 45804]

This GDB supports auto-downloading debuginfo from the following URLs:
  <https://debuginfod.ubuntu.com>
Enable debuginfod for this session? (y or [n]) [answered N; input not from terminal]
Debuginfod has been disabled.
To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit.
warning: could not find '.gnu_debugaltlink' file for /lib/x86_64-linux-gnu/liblber.so.2
warning: could not find '.gnu_debugaltlink' file for /lib/x86_64-linux-gnu/libbrotlidec.so.1
warning: could not find '.gnu_debugaltlink' file for /lib/x86_64-linux-gnu/libbrotlicommon.so.1
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x000071af737107e3 in __GI___wait4 (pid=45806, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
warning: 30     ../sysdeps/unix/sysv/linux/wait4.c: No such file or directory
#0  0x000071af737107e3 in __GI___wait4 (pid=45806, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
30      in ../sysdeps/unix/sysv/linux/wait4.c
#1  0x000071af74082ef3 in ggml_print_backtrace () from /home/trent/uzi-gun/llama.cpp/build/bin/libggml-base.so
#2  0x000071af740946bf in ggml_uncaught_exception() () from /home/trent/uzi-gun/llama.cpp/build/bin/libggml-base.so
#3  0x000071af73abb0da in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x000071af73aa5a55 in std::terminate() () from /lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x000071af73abb391 in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x000071af73aa5ac8 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x000071af740a1249 in std::vector<long, std::allocator<long> >::_M_default_append(unsigned long) () from /home/trent/uzi-gun/llama.cpp/build/bin/libggml-base.so
#8  0x000071af7409e677 in ggml_opt_dataset_init () from /home/trent/uzi-gun/llama.cpp/build/bin/libggml-base.so
#9  0x00006295478cbc29 in common_opt_dataset_init(llama_context*, std::vector<int, std::allocator<int> > const&, long) ()
#10 0x00006295477d8cc7 in main ()
[Inferior 1 (process 45803) detached]
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
Aborted (core dumped)

runing on intel i5 ik its bad no gpu 


Please help me. thanks


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cant finetune smollm2:360m #15532

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cant finetune smollm2:360m #15532

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions