Conversion to EXL2 of Phi-3 Mini 128k July update produces gibberish output #537

SystemPanic · 2024-07-05T20:12:17Z

Seems to be caused by:

Rope type name changed to longrope
Scaling factor list changed

Useful references:

ggml-org/llama.cpp#8262
ggml-org/llama.cpp#6849 (comment)

Conversion log:

------------------------------------------------
| Measured: model.layers.31 (Attention)        |
| Duration: 7.80 seconds                       |
| Completed step: 63/67                        |
| Avg time / step (rolling): 9.28 seconds      |
| Estimated remaining time: 0min 37sec         |
| Last checkpoint layer: model.layers.29 (MLP) |
------------------------------------------------
 -- Layer: model.layers.31 (MLP)
 -- model.layers.31.mlp.gate_proj                      0.05:3b_64g/0.95:2b_64g s4                         2.13 bpw
 -- model.layers.31.mlp.gate_proj                      0.1:3b_64g/0.9:2b_64g s4                           2.17 bpw
 -- model.layers.31.mlp.gate_proj                      0.1:4b_128g/0.9:3b_128g s4                         3.16 bpw
 -- model.layers.31.mlp.gate_proj                      0.1:4b_32g/0.9:3b_32g s4                           3.23 bpw
 -- model.layers.31.mlp.gate_proj                      1:4b_128g s4                                       4.04 bpw
 -- model.layers.31.mlp.gate_proj                      1:4b_32g s4                                        4.13 bpw
 -- model.layers.31.mlp.gate_proj                      0.1:5b_128g/0.9:4b_128g s4                         4.16 bpw
 -- model.layers.31.mlp.gate_proj                      0.1:5b_32g/0.9:4b_32g s4                           4.23 bpw
 -- model.layers.31.mlp.gate_proj                      0.1:6b_128g/0.9:5b_128g s4                         5.16 bpw
 -- model.layers.31.mlp.gate_proj                      0.1:6b_32g/0.9:5b_32g s4                           5.23 bpw
 -- model.layers.31.mlp.gate_proj                      1:6b_128g s4                                       6.04 bpw
 -- model.layers.31.mlp.gate_proj                      0.1:8b_128g/0.9:6b_128g s4                         6.29 bpw
 -- model.layers.31.mlp.gate_proj                      1:8b_128g s4                                       8.04 bpw
 -- model.layers.31.mlp.up_proj                        0.05:3b_64g/0.95:2b_64g s4                         2.13 bpw
 -- model.layers.31.mlp.up_proj                        0.25:3b_64g/0.75:2b_64g s4                         2.32 bpw
 -- model.layers.31.mlp.up_proj                        0.3:3b_64g/0.7:2b_64g s4                           2.38 bpw
 -- model.layers.31.mlp.up_proj                        0.25:4b_128g/0.75:3b_128g s4                       3.29 bpw
 -- model.layers.31.mlp.up_proj                        0.25:4b_32g/0.75:3b_32g s4                         3.38 bpw
 -- model.layers.31.mlp.up_proj                        1:4b_32g s4                                        4.13 bpw
 -- model.layers.31.mlp.up_proj                        0.25:5b_128g/0.75:4b_128g s4                       4.29 bpw
 -- model.layers.31.mlp.up_proj                        0.25:5b_32g/0.75:4b_32g s4                         4.38 bpw
 -- model.layers.31.mlp.up_proj                        0.25:6b_128g/0.75:5b_128g s4                       5.29 bpw
 -- model.layers.31.mlp.up_proj                        0.25:6b_32g/0.75:5b_32g s4                         5.38 bpw
 -- model.layers.31.mlp.up_proj                        1:6b_128g s4                                       6.04 bpw
 -- model.layers.31.mlp.up_proj                        0.1:8b_128g/0.9:6b_128g s4                         6.29 bpw
 -- model.layers.31.mlp.up_proj                        1:8b_128g s4                                       8.04 bpw
 -- model.layers.31.mlp.down_proj                      0.05:6b_32g/0.2:3b_64g/0.75:2b_64g s4              2.48 bpw
 -- model.layers.31.mlp.down_proj                      0.05:5b_32g/0.95:3b_32g s4                         3.24 bpw
 -- model.layers.31.mlp.down_proj                      0.05:5b_32g/0.95:4b_32g s4                         4.19 bpw
 -- model.layers.31.mlp.down_proj                      0.05:8b_32g/0.1:4b_128g/0.85:3b_128g s4            3.41 bpw
 -- model.layers.31.mlp.down_proj                      0.05:8b_32g/0.1:4b_32g/0.85:3b_32g s4              3.49 bpw
 -- model.layers.31.mlp.down_proj                      0.05:8b_32g/0.95:4b_128g s4                        4.25 bpw
 -- model.layers.31.mlp.down_proj                      0.05:8b_32g/0.95:4b_32g s4                         4.34 bpw
 -- model.layers.31.mlp.down_proj                      0.05:8b_32g/0.1:5b_128g/0.85:4b_128g s4            4.36 bpw
 -- model.layers.31.mlp.down_proj                      0.05:8b_32g/0.1:5b_32g/0.85:4b_32g s4              4.44 bpw
 -- model.layers.31.mlp.down_proj                      0.05:8b_32g/0.1:6b_128g/0.85:5b_128g s4            5.31 bpw
 -- model.layers.31.mlp.down_proj                      0.05:8b_32g/0.1:6b_32g/0.85:5b_32g s4              5.39 bpw
 -- model.layers.31.mlp.down_proj                      0.05:8b_32g/0.95:6b_128g s4                        6.15 bpw
 -- model.layers.31.mlp.down_proj                      0.15:8b_128g/0.85:6b_128g s4                       6.35 bpw
 -- model.layers.31.mlp.down_proj                      1:8b_128g s4                                       8.04 bpw
 -- 2.2469 bpw  accuracy: 0.93468168
 -- 2.3233 bpw  accuracy: 0.93676452
 -- 2.5957 bpw  accuracy: 0.94465024
 -- 2.9121 bpw  accuracy: 0.94718373
 -- 3.2851 bpw  accuracy: 0.96705803
 -- 3.3679 bpw  accuracy: 0.96966901
 -- 3.6207 bpw  accuracy: 0.97334990
 -- 4.1380 bpw  accuracy: 0.98255626
 -- 4.1991 bpw  accuracy: 0.98405144
 -- 4.2682 bpw  accuracy: 0.98309226
 -- 4.3510 bpw  accuracy: 0.98517615
 -- 5.2513 bpw  accuracy: 0.99132111
 -- 5.3341 bpw  accuracy: 0.99250382
 -- 6.0729 bpw  accuracy: 0.99510243
 -- 6.3082 bpw  accuracy: 0.99555561
 -- 6.8707 bpw  accuracy: 0.99634729
 -- 8.0374 bpw  accuracy: 0.99851187
------------------------------------------------
| Measured: model.layers.31 (MLP)              |
| Duration: 10.76 seconds                      |
| Completed step: 64/67                        |
| Avg time / step (rolling): 9.29 seconds      |
| Estimated remaining time: 0min 27sec         |
| Last checkpoint layer: model.layers.29 (MLP) |
------------------------------------------------
 -- Layer: model.norm (RMSNorm)
------------------------------------------------
| Measured: model.norm (RMSNorm)               |
| Duration: 0.26 seconds                       |
| Completed step: 65/67                        |
| Avg time / step (rolling): 8.52 seconds      |
| Estimated remaining time: 0min 17sec         |
| Last checkpoint layer: model.layers.29 (MLP) |
------------------------------------------------
 -- Layer: lm_head (Linear)
------------------------------------------------
| Measured: lm_head (Linear)                   |
| Duration: 0.34 seconds                       |
| Completed step: 66/67                        |
| Avg time / step (rolling): 7.51 seconds      |
| Estimated remaining time: 0min 7sec          |
| Last checkpoint layer: model.layers.29 (MLP) |
------------------------------------------------
 -- Saving checkpoint...
 -- Optimizing...
 -- Optimizing:    1/ 240
 -- Optimizing:    9/ 240
 -- Optimizing:   17/ 240
 -- Optimizing:   25/ 240
 -- Optimizing:   33/ 240
 -- Optimizing:   41/ 240
 -- Optimizing:   49/ 240
 -- Optimizing:   57/ 240
 -- Optimizing:   65/ 240
 -- Optimizing:   73/ 240
 -- Optimizing:   80/ 240
 -- Optimizing:   88/ 240
 -- Optimizing:   96/ 240
 -- Optimizing:  104/ 240
 -- Optimizing:  112/ 240
 -- Optimizing:  120/ 240
 -- Optimizing:  128/ 240
 -- Optimizing:  136/ 240
 -- Optimizing:  144/ 240
 -- Optimizing:  152/ 240
 -- Optimizing:  160/ 240
 -- Optimizing:  168/ 240
 -- Optimizing:  176/ 240
 -- Optimizing:  184/ 240
 -- Optimizing:  192/ 240
 -- Optimizing:  200/ 240
 -- Optimizing:  208/ 240
 -- Optimizing:  216/ 240
 -- Optimizing:  224/ 240
 -- Optimizing:  232/ 240
 -- Optimizing:  240/ 240
 -- max(err): 0.005406
 -- error_norm: 1.485759
 -- Quantization strategy:
 --   model.layers.0.self_attn                           6.6359 bpw - exp. error: 0.00218182
 --   model.layers.0.mlp                                 8.0374 bpw - exp. error: 0.00114895
 --   model.layers.1.self_attn                           8.0418 bpw - exp. error: 0.00184583
 --   model.layers.1.mlp                                 8.0374 bpw - exp. error: 0.00199654
 --   model.layers.2.self_attn                           8.0418 bpw - exp. error: 0.00177566
 --   model.layers.2.mlp                                 6.0729 bpw - exp. error: 0.00249584
 --   model.layers.3.self_attn                           4.1930 bpw - exp. error: 0.00383048
 --   model.layers.3.mlp                                 6.0729 bpw - exp. error: 0.00203851
 --   model.layers.4.self_attn                           6.6359 bpw - exp. error: 0.00102152
 --   model.layers.4.mlp                                 6.3082 bpw - exp. error: 0.00182404
 --   model.layers.5.self_attn                           4.4013 bpw - exp. error: 0.00264310
 --   model.layers.5.mlp                                 5.2513 bpw - exp. error: 0.00287902
 --   model.layers.6.self_attn                           4.4013 bpw - exp. error: 0.00337663
 --   model.layers.6.mlp                                 6.8707 bpw - exp. error: 0.00146585
 --   model.layers.7.self_attn                           6.6359 bpw - exp. error: 0.00094822
 --   model.layers.7.mlp                                 6.8707 bpw - exp. error: 0.00184917
 --   model.layers.8.self_attn                           6.6359 bpw - exp. error: 0.00114748
 --   model.layers.8.mlp                                 6.0729 bpw - exp. error: 0.00230076
 --   model.layers.9.self_attn                           6.6359 bpw - exp. error: 0.00127157
 --   model.layers.9.mlp                                 5.3341 bpw - exp. error: 0.00378097
 --   model.layers.10.self_attn                          6.6359 bpw - exp. error: 0.00155776
 --   model.layers.10.mlp                                6.3082 bpw - exp. error: 0.00244060
 --   model.layers.11.self_attn                          8.0418 bpw - exp. error: 0.00068859
 --   model.layers.11.mlp                                6.0729 bpw - exp. error: 0.00267253
 --   model.layers.12.self_attn                          6.6359 bpw - exp. error: 0.00177117
 --   model.layers.12.mlp                                6.8707 bpw - exp. error: 0.00214834
 --   model.layers.13.self_attn                          5.4640 bpw - exp. error: 0.00361148
 --   model.layers.13.mlp                                6.8707 bpw - exp. error: 0.00213348
 --   model.layers.14.self_attn                          6.0418 bpw - exp. error: 0.00148709
 --   model.layers.14.mlp                                6.0729 bpw - exp. error: 0.00155184
 --   model.layers.15.self_attn                          8.0418 bpw - exp. error: 0.00039677
 --   model.layers.15.mlp                                6.8707 bpw - exp. error: 0.00120598
 --   model.layers.16.self_attn                          6.6359 bpw - exp. error: 0.00103175
 --   model.layers.16.mlp                                6.3082 bpw - exp. error: 0.00161467
 --   model.layers.17.self_attn                          8.0418 bpw - exp. error: 0.00047822
 --   model.layers.17.mlp                                6.0729 bpw - exp. error: 0.00194863
 --   model.layers.18.self_attn                          6.0418 bpw - exp. error: 0.00202788
 --   model.layers.18.mlp                                5.2513 bpw - exp. error: 0.00404148
 --   model.layers.19.self_attn                          6.0418 bpw - exp. error: 0.00191705
 --   model.layers.19.mlp                                5.3341 bpw - exp. error: 0.00383573
 --   model.layers.20.self_attn                          6.6359 bpw - exp. error: 0.00128817
 --   model.layers.20.mlp                                5.3341 bpw - exp. error: 0.00428636
 --   model.layers.21.self_attn                          6.0418 bpw - exp. error: 0.00207416
 --   model.layers.21.mlp                                5.3341 bpw - exp. error: 0.00474077
 --   model.layers.22.self_attn                          6.0418 bpw - exp. error: 0.00207343
 --   model.layers.22.mlp                                6.3082 bpw - exp. error: 0.00300660
 --   model.layers.23.self_attn                          8.0418 bpw - exp. error: 0.00056060
 --   model.layers.23.mlp                                5.3341 bpw - exp. error: 0.00540571
 --   model.layers.24.self_attn                          6.6359 bpw - exp. error: 0.00141783
 --   model.layers.24.mlp                                6.0729 bpw - exp. error: 0.00354173
 --   model.layers.25.self_attn                          5.4640 bpw - exp. error: 0.00263537
 --   model.layers.25.mlp                                6.3082 bpw - exp. error: 0.00349990
 --   model.layers.26.self_attn                          6.6359 bpw - exp. error: 0.00133379
 --   model.layers.26.mlp                                8.0374 bpw - exp. error: 0.00102325
 --   model.layers.27.self_attn                          5.4640 bpw - exp. error: 0.00248246
 --   model.layers.27.mlp                                6.3082 bpw - exp. error: 0.00371280
 --   model.layers.28.self_attn                          6.0418 bpw - exp. error: 0.00244441
 --   model.layers.28.mlp                                8.0374 bpw - exp. error: 0.00109955
 --   model.layers.29.self_attn                          5.4640 bpw - exp. error: 0.00300564
 --   model.layers.29.mlp                                8.0374 bpw - exp. error: 0.00177070
 --   model.layers.30.self_attn                          6.6359 bpw - exp. error: 0.00173835
 --   model.layers.30.mlp                                8.0374 bpw - exp. error: 0.00135131
 --   model.layers.31.self_attn                          8.0418 bpw - exp. error: 0.00071250
 --   model.layers.31.mlp                                8.0374 bpw - exp. error: 0.00148813
 -- sum(log(err)): -402.140137
 -- max(err): 0.005406
 -- Tokenizing samples...
 -- Token embeddings again...
 -- Quantizing...
 -- Layer: model.layers.0 (Attention)
 -- Linear: model.layers.0.self_attn.q_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.0.self_attn.k_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.0.self_attn.v_proj -> 1:8b_32g s4, 8.14 bpw
 -- Linear: model.layers.0.self_attn.o_proj -> 1:6b_32g s4, 6.14 bpw
 -- Module quantized, rfn_error: 0.002210
 -- Layer: model.layers.0 (MLP)
 -- Linear: model.layers.0.mlp.gate_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.0.mlp.up_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.0.mlp.down_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.001247
 -- Layer: model.layers.1 (Attention)
 -- Linear: model.layers.1.self_attn.q_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.1.self_attn.k_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.1.self_attn.v_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.1.self_attn.o_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.001910
 -- Layer: model.layers.1 (MLP)
 -- Linear: model.layers.1.mlp.gate_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.1.mlp.up_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.1.mlp.down_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.002304
 -- Layer: model.layers.2 (Attention)
 -- Linear: model.layers.2.self_attn.q_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.2.self_attn.k_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.2.self_attn.v_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.2.self_attn.o_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.001826
 -- Layer: model.layers.2 (MLP)
 -- Linear: model.layers.2.mlp.gate_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.2.mlp.up_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.2.mlp.down_proj -> 0.05:8b_32g/0.95:6b_128g s4, 6.15 bpw
 -- Module quantized, rfn_error: 0.003184
 -- Layer: model.layers.3 (Attention)
 -- Linear: model.layers.3.self_attn.q_proj -> 0.1:5b_64g/0.9:4b_64g s4, 4.18 bpw
 -- Linear: model.layers.3.self_attn.k_proj -> 0.1:5b_64g/0.9:4b_64g s4, 4.18 bpw
 -- Linear: model.layers.3.self_attn.v_proj -> 0.1:5b_32g/0.9:4b_32g s4, 4.24 bpw
 -- Linear: model.layers.3.self_attn.o_proj -> 0.1:5b_64g/0.9:4b_64g s4, 4.18 bpw
 -- Module quantized, rfn_error: 0.004051
 -- Layer: model.layers.3 (MLP)
 -- Linear: model.layers.3.mlp.gate_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.3.mlp.up_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.3.mlp.down_proj -> 0.05:8b_32g/0.95:6b_128g s4, 6.15 bpw
 -- Module quantized, rfn_error: 0.002333
 -- Layer: model.layers.4 (Attention)
 -- Linear: model.layers.4.self_attn.q_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.4.self_attn.k_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.4.self_attn.v_proj -> 1:8b_32g s4, 8.14 bpw
 -- Linear: model.layers.4.self_attn.o_proj -> 1:6b_32g s4, 6.14 bpw
 -- Module quantized, rfn_error: 0.001081
 -- Layer: model.layers.4 (MLP)
 -- Linear: model.layers.4.mlp.gate_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.4.mlp.up_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.4.mlp.down_proj -> 0.15:8b_128g/0.85:6b_128g s4, 6.35 bpw
 -- Module quantized, rfn_error: 0.001737
 -- Layer: model.layers.5 (Attention)
 -- Linear: model.layers.5.self_attn.q_proj -> 0.1:5b_64g/0.9:4b_64g s4, 4.18 bpw
 -- Linear: model.layers.5.self_attn.k_proj -> 0.1:5b_64g/0.9:4b_64g s4, 4.18 bpw
 -- Linear: model.layers.5.self_attn.v_proj -> 1:5b_64g s4, 5.07 bpw
 -- Linear: model.layers.5.self_attn.o_proj -> 0.1:5b_64g/0.9:4b_64g s4, 4.18 bpw
 -- Module quantized, rfn_error: 0.002412
 -- Layer: model.layers.5 (MLP)
 -- Linear: model.layers.5.mlp.gate_proj -> 0.1:6b_128g/0.9:5b_128g s4, 5.16 bpw
 -- Linear: model.layers.5.mlp.up_proj -> 0.25:6b_128g/0.75:5b_128g s4, 5.29 bpw
 -- Linear: model.layers.5.mlp.down_proj -> 0.05:8b_32g/0.1:6b_128g/0.85:5b_128g s4, 5.31 bpw
 -- Module quantized, rfn_error: 0.002792
 -- Layer: model.layers.6 (Attention)
 -- Linear: model.layers.6.self_attn.q_proj -> 0.1:5b_64g/0.9:4b_64g s4, 4.18 bpw
 -- Linear: model.layers.6.self_attn.k_proj -> 0.1:5b_64g/0.9:4b_64g s4, 4.18 bpw
 -- Linear: model.layers.6.self_attn.v_proj -> 1:5b_64g s4, 5.07 bpw
 -- Linear: model.layers.6.self_attn.o_proj -> 0.1:5b_64g/0.9:4b_64g s4, 4.18 bpw
 -- Module quantized, rfn_error: 0.003026
 -- Layer: model.layers.6 (MLP)
 -- Linear: model.layers.6.mlp.gate_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.6.mlp.up_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.6.mlp.down_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.001388
 -- Layer: model.layers.7 (Attention)
 -- Linear: model.layers.7.self_attn.q_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.7.self_attn.k_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.7.self_attn.v_proj -> 1:8b_32g s4, 8.14 bpw
 -- Linear: model.layers.7.self_attn.o_proj -> 1:6b_32g s4, 6.14 bpw
 -- Module quantized, rfn_error: 0.000886
 -- Layer: model.layers.7 (MLP)
 -- Linear: model.layers.7.mlp.gate_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.7.mlp.up_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.7.mlp.down_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.001762
 -- Layer: model.layers.8 (Attention)
 -- Linear: model.layers.8.self_attn.q_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.8.self_attn.k_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.8.self_attn.v_proj -> 1:8b_32g s4, 8.14 bpw
 -- Linear: model.layers.8.self_attn.o_proj -> 1:6b_32g s4, 6.14 bpw
 -- Module quantized, rfn_error: 0.001070
 -- Layer: model.layers.8 (MLP)
 -- Linear: model.layers.8.mlp.gate_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.8.mlp.up_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.8.mlp.down_proj -> 0.05:8b_32g/0.95:6b_128g s4, 6.15 bpw
 -- Module quantized, rfn_error: 0.002282
 -- Layer: model.layers.9 (Attention)
 -- Linear: model.layers.9.self_attn.q_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.9.self_attn.k_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.9.self_attn.v_proj -> 1:8b_32g s4, 8.14 bpw
 -- Linear: model.layers.9.self_attn.o_proj -> 1:6b_32g s4, 6.14 bpw
 -- Module quantized, rfn_error: 0.001224
 -- Layer: model.layers.9 (MLP)
 -- Linear: model.layers.9.mlp.gate_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.23 bpw
 -- Linear: model.layers.9.mlp.up_proj -> 0.25:6b_32g/0.75:5b_32g s4, 5.38 bpw
 -- Linear: model.layers.9.mlp.down_proj -> 0.05:8b_32g/0.1:6b_32g/0.85:5b_32g s4, 5.39 bpw
 -- Module quantized, rfn_error: 0.003722
 -- Layer: model.layers.10 (Attention)
 -- Linear: model.layers.10.self_attn.q_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.10.self_attn.k_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.10.self_attn.v_proj -> 1:8b_32g s4, 8.14 bpw
 -- Linear: model.layers.10.self_attn.o_proj -> 1:6b_32g s4, 6.14 bpw
 -- Module quantized, rfn_error: 0.001441
 -- Layer: model.layers.10 (MLP)
 -- Linear: model.layers.10.mlp.gate_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.10.mlp.up_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.10.mlp.down_proj -> 0.15:8b_128g/0.85:6b_128g s4, 6.35 bpw
 -- Module quantized, rfn_error: 0.002382
 -- Layer: model.layers.11 (Attention)
 -- Linear: model.layers.11.self_attn.q_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.11.self_attn.k_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.11.self_attn.v_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.11.self_attn.o_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.000652
 -- Layer: model.layers.11 (MLP)
 -- Linear: model.layers.11.mlp.gate_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.11.mlp.up_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.11.mlp.down_proj -> 0.05:8b_32g/0.95:6b_128g s4, 6.15 bpw
 -- Module quantized, rfn_error: 0.002618
 -- Layer: model.layers.12 (Attention)
 -- Linear: model.layers.12.self_attn.q_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.12.self_attn.k_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.12.self_attn.v_proj -> 1:8b_32g s4, 8.14 bpw
 -- Linear: model.layers.12.self_attn.o_proj -> 1:6b_32g s4, 6.14 bpw
 -- Module quantized, rfn_error: 0.001683
 -- Layer: model.layers.12 (MLP)
 -- Linear: model.layers.12.mlp.gate_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.12.mlp.up_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.12.mlp.down_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.002034
 -- Layer: model.layers.13 (Attention)
 -- Linear: model.layers.13.self_attn.q_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.24 bpw
 -- Linear: model.layers.13.self_attn.k_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.24 bpw
 -- Linear: model.layers.13.self_attn.v_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.13.self_attn.o_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.24 bpw
 -- Module quantized, rfn_error: 0.003492
 -- Layer: model.layers.13 (MLP)
 -- Linear: model.layers.13.mlp.gate_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.13.mlp.up_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.13.mlp.down_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.001966
 -- Layer: model.layers.14 (Attention)
 -- Linear: model.layers.14.self_attn.q_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.14.self_attn.k_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.14.self_attn.v_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.14.self_attn.o_proj -> 1:6b_128g s4, 6.04 bpw
 -- Module quantized, rfn_error: 0.001318
 -- Layer: model.layers.14 (MLP)
 -- Linear: model.layers.14.mlp.gate_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.14.mlp.up_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.14.mlp.down_proj -> 0.05:8b_32g/0.95:6b_128g s4, 6.15 bpw
 -- Module quantized, rfn_error: 0.001441
 -- Layer: model.layers.15 (Attention)
 -- Linear: model.layers.15.self_attn.q_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.15.self_attn.k_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.15.self_attn.v_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.15.self_attn.o_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.000365
 -- Layer: model.layers.15 (MLP)
 -- Linear: model.layers.15.mlp.gate_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.15.mlp.up_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.15.mlp.down_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.001103
 -- Layer: model.layers.16 (Attention)
 -- Linear: model.layers.16.self_attn.q_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.16.self_attn.k_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.16.self_attn.v_proj -> 1:8b_32g s4, 8.14 bpw
 -- Linear: model.layers.16.self_attn.o_proj -> 1:6b_32g s4, 6.14 bpw
 -- Module quantized, rfn_error: 0.000938
 -- Layer: model.layers.16 (MLP)
 -- Linear: model.layers.16.mlp.gate_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.16.mlp.up_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.16.mlp.down_proj -> 0.15:8b_128g/0.85:6b_128g s4, 6.35 bpw
 -- Module quantized, rfn_error: 0.001508
 -- Layer: model.layers.17 (Attention)
 -- Linear: model.layers.17.self_attn.q_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.17.self_attn.k_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.17.self_attn.v_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.17.self_attn.o_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.000418
 -- Saving checkpoint...
 -- Layer: model.layers.17 (MLP)
 -- Linear: model.layers.17.mlp.gate_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.17.mlp.up_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.17.mlp.down_proj -> 0.05:8b_32g/0.95:6b_128g s4, 6.15 bpw
 -- Module quantized, rfn_error: 0.001869
 -- Layer: model.layers.18 (Attention)
 -- Linear: model.layers.18.self_attn.q_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.18.self_attn.k_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.18.self_attn.v_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.18.self_attn.o_proj -> 1:6b_128g s4, 6.04 bpw
 -- Module quantized, rfn_error: 0.001830
 -- Layer: model.layers.18 (MLP)
 -- Linear: model.layers.18.mlp.gate_proj -> 0.1:6b_128g/0.9:5b_128g s4, 5.16 bpw
 -- Linear: model.layers.18.mlp.up_proj -> 0.25:6b_128g/0.75:5b_128g s4, 5.29 bpw
 -- Linear: model.layers.18.mlp.down_proj -> 0.05:8b_32g/0.1:6b_128g/0.85:5b_128g s4, 5.31 bpw
 -- Module quantized, rfn_error: 0.003908
 -- Layer: model.layers.19 (Attention)
 -- Linear: model.layers.19.self_attn.q_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.19.self_attn.k_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.19.self_attn.v_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.19.self_attn.o_proj -> 1:6b_128g s4, 6.04 bpw
 -- Module quantized, rfn_error: 0.001757
 -- Layer: model.layers.19 (MLP)
 -- Linear: model.layers.19.mlp.gate_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.23 bpw
 -- Linear: model.layers.19.mlp.up_proj -> 0.25:6b_32g/0.75:5b_32g s4, 5.38 bpw
 -- Linear: model.layers.19.mlp.down_proj -> 0.05:8b_32g/0.1:6b_32g/0.85:5b_32g s4, 5.39 bpw
 -- Module quantized, rfn_error: 0.003729
 -- Layer: model.layers.20 (Attention)
 -- Linear: model.layers.20.self_attn.q_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.20.self_attn.k_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.20.self_attn.v_proj -> 1:8b_32g s4, 8.14 bpw
 -- Linear: model.layers.20.self_attn.o_proj -> 1:6b_32g s4, 6.14 bpw
 -- Module quantized, rfn_error: 0.001186
 -- Layer: model.layers.20 (MLP)
 -- Linear: model.layers.20.mlp.gate_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.23 bpw
 -- Linear: model.layers.20.mlp.up_proj -> 0.25:6b_32g/0.75:5b_32g s4, 5.38 bpw
 -- Linear: model.layers.20.mlp.down_proj -> 0.05:8b_32g/0.1:6b_32g/0.85:5b_32g s4, 5.39 bpw
 -- Module quantized, rfn_error: 0.004217
 -- Layer: model.layers.21 (Attention)
 -- Linear: model.layers.21.self_attn.q_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.21.self_attn.k_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.21.self_attn.v_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.21.self_attn.o_proj -> 1:6b_128g s4, 6.04 bpw
 -- Module quantized, rfn_error: 0.001915
 -- Layer: model.layers.21 (MLP)
 -- Linear: model.layers.21.mlp.gate_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.23 bpw
 -- Linear: model.layers.21.mlp.up_proj -> 0.25:6b_32g/0.75:5b_32g s4, 5.38 bpw
 -- Linear: model.layers.21.mlp.down_proj -> 0.05:8b_32g/0.1:6b_32g/0.85:5b_32g s4, 5.39 bpw
 -- Module quantized, rfn_error: 0.004769
 -- Layer: model.layers.22 (Attention)
 -- Linear: model.layers.22.self_attn.q_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.22.self_attn.k_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.22.self_attn.v_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.22.self_attn.o_proj -> 1:6b_128g s4, 6.04 bpw
 -- Module quantized, rfn_error: 0.002010
 -- Layer: model.layers.22 (MLP)
 -- Linear: model.layers.22.mlp.gate_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.22.mlp.up_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.22.mlp.down_proj -> 0.15:8b_128g/0.85:6b_128g s4, 6.35 bpw
 -- Module quantized, rfn_error: 0.003114
 -- Layer: model.layers.23 (Attention)
 -- Linear: model.layers.23.self_attn.q_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.23.self_attn.k_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.23.self_attn.v_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.23.self_attn.o_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.000544
 -- Layer: model.layers.23 (MLP)
 -- Linear: model.layers.23.mlp.gate_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.23 bpw
 -- Linear: model.layers.23.mlp.up_proj -> 0.25:6b_32g/0.75:5b_32g s4, 5.38 bpw
 -- Linear: model.layers.23.mlp.down_proj -> 0.05:8b_32g/0.1:6b_32g/0.85:5b_32g s4, 5.39 bpw
 -- Module quantized, rfn_error: 0.005750
 -- Layer: model.layers.24 (Attention)
 -- Linear: model.layers.24.self_attn.q_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.24.self_attn.k_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.24.self_attn.v_proj -> 1:8b_32g s4, 8.14 bpw
 -- Linear: model.layers.24.self_attn.o_proj -> 1:6b_32g s4, 6.14 bpw
 -- Module quantized, rfn_error: 0.001395
 -- Layer: model.layers.24 (MLP)
 -- Linear: model.layers.24.mlp.gate_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.24.mlp.up_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.24.mlp.down_proj -> 0.05:8b_32g/0.95:6b_128g s4, 6.15 bpw
 -- Module quantized, rfn_error: 0.003878
 -- Layer: model.layers.25 (Attention)
 -- Linear: model.layers.25.self_attn.q_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.24 bpw
 -- Linear: model.layers.25.self_attn.k_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.24 bpw
 -- Linear: model.layers.25.self_attn.v_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.25.self_attn.o_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.24 bpw
 -- Module quantized, rfn_error: 0.002646
 -- Layer: model.layers.25 (MLP)
 -- Linear: model.layers.25.mlp.gate_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.25.mlp.up_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.25.mlp.down_proj -> 0.15:8b_128g/0.85:6b_128g s4, 6.35 bpw
 -- Module quantized, rfn_error: 0.003885
 -- Layer: model.layers.26 (Attention)
 -- Linear: model.layers.26.self_attn.q_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.26.self_attn.k_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.26.self_attn.v_proj -> 1:8b_32g s4, 8.14 bpw
 -- Linear: model.layers.26.self_attn.o_proj -> 1:6b_32g s4, 6.14 bpw
 -- Module quantized, rfn_error: 0.001354
 -- Layer: model.layers.26 (MLP)
 -- Linear: model.layers.26.mlp.gate_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.26.mlp.up_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.26.mlp.down_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.001154
 -- Layer: model.layers.27 (Attention)
 -- Linear: model.layers.27.self_attn.q_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.24 bpw
 -- Linear: model.layers.27.self_attn.k_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.24 bpw
 -- Linear: model.layers.27.self_attn.v_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.27.self_attn.o_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.24 bpw
 -- Module quantized, rfn_error: 0.002578
 -- Layer: model.layers.27 (MLP)
 -- Linear: model.layers.27.mlp.gate_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.27.mlp.up_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.27.mlp.down_proj -> 0.15:8b_128g/0.85:6b_128g s4, 6.35 bpw
 -- Module quantized, rfn_error: 0.004201
 -- Layer: model.layers.28 (Attention)
 -- Linear: model.layers.28.self_attn.q_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.28.self_attn.k_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.28.self_attn.v_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.28.self_attn.o_proj -> 1:6b_128g s4, 6.04 bpw
 -- Module quantized, rfn_error: 0.002510
 -- Layer: model.layers.28 (MLP)
 -- Linear: model.layers.28.mlp.gate_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.28.mlp.up_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.28.mlp.down_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.001251
 -- Layer: model.layers.29 (Attention)
 -- Linear: model.layers.29.self_attn.q_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.24 bpw
 -- Linear: model.layers.29.self_attn.k_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.24 bpw
 -- Linear: model.layers.29.self_attn.v_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.29.self_attn.o_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.24 bpw
 -- Module quantized, rfn_error: 0.003163
 -- Layer: model.layers.29 (MLP)
 -- Linear: model.layers.29.mlp.gate_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.29.mlp.up_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.29.mlp.down_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.002406
 -- Layer: model.layers.30 (Attention)
 -- Linear: model.layers.30.self_attn.q_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.30.self_attn.k_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.30.self_attn.v_proj -> 1:8b_32g s4, 8.14 bpw
 -- Linear: model.layers.30.self_attn.o_proj -> 1:6b_32g s4, 6.14 bpw
 -- Module quantized, rfn_error: 0.001843
 -- Layer: model.layers.30 (MLP)
 -- Linear: model.layers.30.mlp.gate_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.30.mlp.up_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.30.mlp.down_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.001549
 -- Layer: model.layers.31 (Attention)
 -- Linear: model.layers.31.self_attn.q_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.31.self_attn.k_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.31.self_attn.v_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.31.self_attn.o_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.000743
 -- Layer: model.layers.31 (MLP)
 -- Linear: model.layers.31.mlp.gate_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.31.mlp.up_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.31.mlp.down_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.001628
 -- Layer: model.norm (RMSNorm)
 -- Module quantized, rfn_error: 0.000000
 -- Layer: lm_head (Linear)
 -- Linear: lm_head -> 0.15:8b_128g/0.85:6b_128g s4, 6.37 bpw
 -- Module quantized, calibration perplexity (quant): 9.5581
 -- Saving checkpoint...
 -- Compiling output file...
 -- Writing shard 1...
 -- Creating directory models--microsoft--Phi-3-mini-128k-instruct-exl2/6.5bpw/
 --   models--microsoft--Phi-3-mini-128k-instruct-exl2/6.5bpw/output.safetensors (3,068 MB)
 -- Copying non-tensor files to output directory models--microsoft--Phi-3-mini-128k-instruct-exl2/6.5bpw/
 --   .gitattributes
 --   added_tokens.json
 --   CODE_OF_CONDUCT.md
 --   config.json
 --   configuration_phi3.py
 --   generation_config.json
 --   LICENSE
 --   model.safetensors.index.json
 --   modeling_phi3.py
 --   NOTICE.md
 --   README.md
 --   sample_finetune.py
 --   SECURITY.md
 --   special_tokens_map.json
 --   tokenizer.json
 --   tokenizer.model
 --   tokenizer_config.json
 -- Finished

The text was updated successfully, but these errors were encountered:

turboderp · 2024-07-06T01:27:43Z

So I compared the two versions, and the only changes I can see are

they renamed the "su" scaling method to "longrope"
they removed the yarn implementation from the modeling_phi3.py

If you wouldn't mind, could you try just changing the name to "su" in the config? If that works I can just add an alias and it shouldn't need any other changes.

SystemPanic · 2024-07-06T05:16:15Z

Thanks @turboderp, it works now.

I have submitted a new PR with the change.

SystemPanic mentioned this issue Jul 6, 2024

Fix Phi-3 mini new longrope EXL2 quantization #540

Merged

SystemPanic closed this as completed Jul 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Conversion to EXL2 of Phi-3 Mini 128k July update produces gibberish output #537

Conversion to EXL2 of Phi-3 Mini 128k July update produces gibberish output #537

SystemPanic commented Jul 5, 2024

turboderp commented Jul 6, 2024

Uh oh!

SystemPanic commented Jul 6, 2024

Uh oh!

Uh oh!

Conversion to EXL2 of Phi-3 Mini 128k July update produces gibberish output #537

Conversion to EXL2 of Phi-3 Mini 128k July update produces gibberish output #537

Comments

SystemPanic commented Jul 5, 2024

turboderp commented Jul 6, 2024

Uh oh!

SystemPanic commented Jul 6, 2024

Uh oh!