-
Notifications
You must be signed in to change notification settings - Fork 12.7k
Closed
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomers
Description
The gguf-dump.py script in the llama.cpp release b2297 is missing support for i-quants.
Steps to reproduce
- Create or download a GGUF file in any
IQ*
format (e.g., miqu-1-70b-Requant-b2131-iMat-c32_ch400-IQ1_S_v3.gguf) - Copy the file to
.\models\miqu-1-70b-sf.IQ1_S.gguf
- Execute the following
python .\gguf-py\scripts\gguf-dump.py --no-tensors .\models\miqu-1-70b-sf.IQ1_S.gguf
- See the error:
ValueError: 19 is not a valid GGMLQuantizationType
Expected behaviour
I expect the Python gguf-py
library to support all possible GGUF formats.
Working example for k-quants:
python .\gguf-py\scripts\gguf-dump.py --no-tensors .\models\miqu-1-70b-sf.Q5_K_M.gguf
* Loading: .\models\miqu-1-70b-sf.Q5_K_M.gguf
* File is LITTLE endian, script is running on a LITTLE endian host.
* Dumping 26 key/value pair(s)
1: UINT32 | 1 | GGUF.version = 3
2: UINT64 | 1 | GGUF.tensor_count = 723
3: UINT64 | 1 | GGUF.kv_count = 23
4: STRING | 1 | general.architecture = 'llama'
5: STRING | 1 | general.name = 'R:\\AI\\LLM\\source'
6: UINT32 | 1 | llama.context_length = 32764
7: UINT32 | 1 | llama.embedding_length = 8192
8: UINT32 | 1 | llama.block_count = 80
9: UINT32 | 1 | llama.feed_forward_length = 28672
10: UINT32 | 1 | llama.rope.dimension_count = 128
11: UINT32 | 1 | llama.attention.head_count = 64
12: UINT32 | 1 | llama.attention.head_count_kv = 8
13: FLOAT32 | 1 | llama.attention.layer_norm_rms_epsilon = 9.999999747378752e-06
14: FLOAT32 | 1 | llama.rope.freq_base = 1000000.0
15: UINT32 | 1 | general.file_type = 17
16: STRING | 1 | tokenizer.ggml.model = 'llama'
17: [STRING] | 32000 | tokenizer.ggml.tokens
18: [FLOAT32] | 32000 | tokenizer.ggml.scores
19: [INT32] | 32000 | tokenizer.ggml.token_type
20: UINT32 | 1 | tokenizer.ggml.bos_token_id = 1
21: UINT32 | 1 | tokenizer.ggml.eos_token_id = 2
22: UINT32 | 1 | tokenizer.ggml.padding_token_id = 0
23: BOOL | 1 | tokenizer.ggml.add_bos_token = True
24: BOOL | 1 | tokenizer.ggml.add_eos_token = False
25: STRING | 1 | tokenizer.chat_template = "{{ bos_token }}{% for message in messages %}{% if (message['"
26: UINT32 | 1 | general.quantization_version = 2
Use-Case
I am extracting the metadata from any given GGUF model to automatically calculate the optimal runtime arguments for the server in the following PowerShell script: https://github.com/countzero/windows_llama.cpp/blob/v1.12.0/examples/server.ps1#L104
Question
@ggerganov Is there another way to only dump the metadata from a given GGUF model? Perhaps this could be an --inspect
option of the gguf binary?
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomers