-
Notifications
You must be signed in to change notification settings - Fork 36
Open
Description
I'm trying to load the 0.6B Parameter 8 bit version of Qwen 3 from a GGUF file however the package panics internally
2025-08-12T04:50:17.688676Z INFO llama.cpp: llama_model_loader: - kv 0: general.architecture str = qwen3
2025-08-12T04:50:17.688805Z INFO llama.cpp: llama_model_loader: - kv 1: general.type str = model
2025-08-12T04:50:17.688940Z INFO llama.cpp: llama_model_loader: - kv 2: general.name str = Qwen3 1.7B Instruct
2025-08-12T04:50:17.689118Z INFO llama.cpp: llama_model_loader: - kv 3: general.finetune str = Instruct
2025-08-12T04:50:17.689212Z INFO llama.cpp: llama_model_loader: - kv 4: general.basename str = Qwen3
2025-08-12T04:50:17.689365Z INFO llama.cpp: llama_model_loader: - kv 5: general.size_label str = 1.7B
2025-08-12T04:50:17.689461Z INFO llama.cpp: llama_model_loader: - kv 6: qwen3.block_count u32 = 28
2025-08-12T04:50:17.689538Z INFO llama.cpp: llama_model_loader: - kv 7: qwen3.context_length u32 = 40960
2025-08-12T04:50:17.689661Z INFO llama.cpp: llama_model_loader: - kv 8: qwen3.embedding_length u32 = 2048
2025-08-12T04:50:17.689811Z INFO llama.cpp: llama_model_loader: - kv 9: qwen3.feed_forward_length u32 = 6144
2025-08-12T04:50:17.689945Z INFO llama.cpp: llama_model_loader: - kv 10: qwen3.attention.head_count u32 = 16
2025-08-12T04:50:17.690053Z INFO llama.cpp: llama_model_loader: - kv 11: qwen3.attention.head_count_kv u32 = 8
2025-08-12T04:50:17.690192Z INFO llama.cpp: llama_model_loader: - kv 12: qwen3.rope.freq_base f32 = 1000000.000000
2025-08-12T04:50:17.690333Z INFO llama.cpp: llama_model_loader: - kv 13: qwen3.attention.layer_norm_rms_epsilon f32 = 0.000001
2025-08-12T04:50:17.690467Z INFO llama.cpp: llama_model_loader: - kv 14: qwen3.attention.key_length u32 = 128
2025-08-12T04:50:17.690596Z INFO llama.cpp: llama_model_loader: - kv 15: qwen3.attention.value_length u32 = 128
2025-08-12T04:50:17.690698Z INFO llama.cpp: llama_model_loader: - kv 16: tokenizer.ggml.model str = gpt2
2025-08-12T04:50:17.690811Z INFO llama.cpp: llama_model_loader: - kv 17: tokenizer.ggml.pre str = qwen2
2025-08-12T04:50:17.826803Z INFO llama.cpp: llama_model_loader: - kv 18: tokenizer.ggml.tokens arr[str,151936] = ["!", "\"", "#", "$", "%", "&", "'", ...
2025-08-12T04:50:17.858514Z INFO llama.cpp: llama_model_loader: - kv 19: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
2025-08-12T04:50:18.002530Z INFO llama.cpp: llama_model_loader: - kv 20: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
2025-08-12T04:50:18.003067Z INFO llama.cpp: llama_model_loader: - kv 21: tokenizer.ggml.eos_token_id u32 = 151645
- type f32: 113 tensors
2025-08-12T04:50:18.005749Z INFO llama.cpp: llama_model_loader: - type q8_0: 197 tensors
2025-08-12T04:50:18.006060Z ERROR llama.cpp: llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen3'
2025-08-12T04:50:18.006218Z ERROR llama.cpp: llama_load_model_from_file: failed to load model
thread 'main' panicked at C:\Users\norik\OneDrive\Desktop\Projects\qlerk\src-tauri\src\lib.rs:75:6:
Failed to create LLM: LlamaError(LlamaInternalError)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Alternatively when I try to load jina-embeddings-v4-text-retrieval-GGUF which is based on qwen2.5-vl-3b-instruct I get this unsupported message:
2025-08-12T05:00:46.839192Z INFO llama.cpp: llama_model_loader: - kv 0: general.architecture str = qwen2vl
2025-08-12T05:00:46.839273Z INFO llama.cpp: llama_model_loader: - kv 1: general.type str = model
2025-08-12T05:00:46.839358Z INFO llama.cpp: llama_model_loader: - kv 2: general.name str = Jev4 Text Retrieval
2025-08-12T05:00:46.839572Z INFO llama.cpp: llama_model_loader: - kv 3: general.size_label str = 3.1B
2025-08-12T05:00:46.839713Z INFO llama.cpp: llama_model_loader: - kv 4: qwen2vl.block_count u32 = 36
2025-08-12T05:00:46.839859Z INFO llama.cpp: llama_model_loader: - kv 5: qwen2vl.context_length u32 = 128000
2025-08-12T05:00:46.840071Z INFO llama.cpp: llama_model_loader: - kv 6: qwen2vl.embedding_length u32 = 2048
2025-08-12T05:00:46.840253Z INFO llama.cpp: llama_model_loader: - kv 7: qwen2vl.feed_forward_length u32 = 11008
2025-08-12T05:00:46.840437Z INFO llama.cpp: llama_model_loader: - kv 8: qwen2vl.attention.head_count u32 = 16
2025-08-12T05:00:46.840588Z INFO llama.cpp: llama_model_loader: - kv 9: qwen2vl.attention.head_count_kv u32 = 2
2025-08-12T05:00:46.840739Z INFO llama.cpp: llama_model_loader: - kv 10: qwen2vl.rope.freq_base f32 = 1000000.000000
2025-08-12T05:00:46.841019Z INFO llama.cpp: llama_model_loader: - kv 11: qwen2vl.attention.layer_norm_rms_epsilon f32 = 0.000001
2025-08-12T05:00:46.841254Z INFO llama.cpp: llama_model_loader: - kv 12: qwen2vl.rope.dimension_sections arr[i32,4] = [16, 24, 24, 0]
2025-08-12T05:00:46.841497Z INFO llama.cpp: llama_model_loader: - kv 13: tokenizer.ggml.model str = gpt2
2025-08-12T05:00:46.841658Z INFO llama.cpp: llama_model_loader: - kv 14: tokenizer.ggml.pre str = qwen2
2025-08-12T05:00:46.976960Z INFO llama.cpp: llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,151936] = ["!", "\"", "#", "$", "%", "&", "'", ...
2025-08-12T05:00:47.004920Z INFO llama.cpp: llama_model_loader: - kv 16: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
2025-08-12T05:00:47.138014Z INFO llama.cpp: llama_model_loader: - kv 17: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
2025-08-12T05:00:47.138306Z INFO llama.cpp: llama_model_loader: - kv 18: tokenizer.ggml.bos_token_id u32 = 151643
2025-08-12T05:00:47.138475Z INFO llama.cpp: llama_model_loader: - kv 19: tokenizer.ggml.eos_token_id u32 = 151645
2025-08-12T05:00:47.138670Z INFO llama.cpp: llama_model_loader: - kv 20: general.quantization_version u32 = 2
2025-08-12T05:00:47.138787Z INFO llama.cpp: llama_model_loader: - kv 21: general.file_type u32 = 15
2025-08-12T05:00:47.138860Z INFO llama.cpp: llama_model_loader: - kv 22: quantize.imatrix.file str = imatrix-retrieval-512.dat
2025-08-12T05:00:47.139019Z INFO llama.cpp: llama_model_loader: - kv 23: quantize.imatrix.dataset str = calibration_data_v5_rc.txt
2025-08-12T05:00:47.139186Z INFO llama.cpp: llama_model_loader: - kv 24: quantize.imatrix.entries_count u32 = 252
2025-08-12T05:00:47.139510Z INFO llama.cpp: llama_model_loader: - kv 25: quantize.imatrix.chunks_count u32 = 225
2025-08-12T05:00:47.139743Z INFO llama.cpp: llama_model_loader: - type f32: 181 tensors
2025-08-12T05:00:47.139902Z INFO llama.cpp: llama_model_loader: - type q4_K: 216 tensors
2025-08-12T05:00:47.140094Z INFO llama.cpp: llama_model_loader: - type q6_K: 37 tensors
2025-08-12T05:00:47.140657Z ERROR llama.cpp: llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen2vl'
2025-08-12T05:00:47.140781Z ERROR llama.cpp: llama_load_model_from_file: failed to load model
This is odd because I saw here that this architecture has indeed been supported on llama since December of 2024
Metadata
Metadata
Assignees
Labels
No labels