Skip to content

Conversion to EXL2 of Phi-3 Mini 128k July update produces gibberish output #537

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
SystemPanic opened this issue Jul 5, 2024 · 2 comments · Fixed by #540
Closed

Conversion to EXL2 of Phi-3 Mini 128k July update produces gibberish output #537

SystemPanic opened this issue Jul 5, 2024 · 2 comments · Fixed by #540

Comments

@SystemPanic
Copy link
Contributor

Seems to be caused by:

  • Rope type name changed to longrope
  • Scaling factor list changed

Useful references:

ggml-org/llama.cpp#8262
ggml-org/llama.cpp#6849 (comment)

Conversion log:

------------------------------------------------
| Measured: model.layers.31 (Attention)        |
| Duration: 7.80 seconds                       |
| Completed step: 63/67                        |
| Avg time / step (rolling): 9.28 seconds      |
| Estimated remaining time: 0min 37sec         |
| Last checkpoint layer: model.layers.29 (MLP) |
------------------------------------------------
 -- Layer: model.layers.31 (MLP)
 -- model.layers.31.mlp.gate_proj                      0.05:3b_64g/0.95:2b_64g s4                         2.13 bpw
 -- model.layers.31.mlp.gate_proj                      0.1:3b_64g/0.9:2b_64g s4                           2.17 bpw
 -- model.layers.31.mlp.gate_proj                      0.1:4b_128g/0.9:3b_128g s4                         3.16 bpw
 -- model.layers.31.mlp.gate_proj                      0.1:4b_32g/0.9:3b_32g s4                           3.23 bpw
 -- model.layers.31.mlp.gate_proj                      1:4b_128g s4                                       4.04 bpw
 -- model.layers.31.mlp.gate_proj                      1:4b_32g s4                                        4.13 bpw
 -- model.layers.31.mlp.gate_proj                      0.1:5b_128g/0.9:4b_128g s4                         4.16 bpw
 -- model.layers.31.mlp.gate_proj                      0.1:5b_32g/0.9:4b_32g s4                           4.23 bpw
 -- model.layers.31.mlp.gate_proj                      0.1:6b_128g/0.9:5b_128g s4                         5.16 bpw
 -- model.layers.31.mlp.gate_proj                      0.1:6b_32g/0.9:5b_32g s4                           5.23 bpw
 -- model.layers.31.mlp.gate_proj                      1:6b_128g s4                                       6.04 bpw
 -- model.layers.31.mlp.gate_proj                      0.1:8b_128g/0.9:6b_128g s4                         6.29 bpw
 -- model.layers.31.mlp.gate_proj                      1:8b_128g s4                                       8.04 bpw
 -- model.layers.31.mlp.up_proj                        0.05:3b_64g/0.95:2b_64g s4                         2.13 bpw
 -- model.layers.31.mlp.up_proj                        0.25:3b_64g/0.75:2b_64g s4                         2.32 bpw
 -- model.layers.31.mlp.up_proj                        0.3:3b_64g/0.7:2b_64g s4                           2.38 bpw
 -- model.layers.31.mlp.up_proj                        0.25:4b_128g/0.75:3b_128g s4                       3.29 bpw
 -- model.layers.31.mlp.up_proj                        0.25:4b_32g/0.75:3b_32g s4                         3.38 bpw
 -- model.layers.31.mlp.up_proj                        1:4b_32g s4                                        4.13 bpw
 -- model.layers.31.mlp.up_proj                        0.25:5b_128g/0.75:4b_128g s4                       4.29 bpw
 -- model.layers.31.mlp.up_proj                        0.25:5b_32g/0.75:4b_32g s4                         4.38 bpw
 -- model.layers.31.mlp.up_proj                        0.25:6b_128g/0.75:5b_128g s4                       5.29 bpw
 -- model.layers.31.mlp.up_proj                        0.25:6b_32g/0.75:5b_32g s4                         5.38 bpw
 -- model.layers.31.mlp.up_proj                        1:6b_128g s4                                       6.04 bpw
 -- model.layers.31.mlp.up_proj                        0.1:8b_128g/0.9:6b_128g s4                         6.29 bpw
 -- model.layers.31.mlp.up_proj                        1:8b_128g s4                                       8.04 bpw
 -- model.layers.31.mlp.down_proj                      0.05:6b_32g/0.2:3b_64g/0.75:2b_64g s4              2.48 bpw
 -- model.layers.31.mlp.down_proj                      0.05:5b_32g/0.95:3b_32g s4                         3.24 bpw
 -- model.layers.31.mlp.down_proj                      0.05:5b_32g/0.95:4b_32g s4                         4.19 bpw
 -- model.layers.31.mlp.down_proj                      0.05:8b_32g/0.1:4b_128g/0.85:3b_128g s4            3.41 bpw
 -- model.layers.31.mlp.down_proj                      0.05:8b_32g/0.1:4b_32g/0.85:3b_32g s4              3.49 bpw
 -- model.layers.31.mlp.down_proj                      0.05:8b_32g/0.95:4b_128g s4                        4.25 bpw
 -- model.layers.31.mlp.down_proj                      0.05:8b_32g/0.95:4b_32g s4                         4.34 bpw
 -- model.layers.31.mlp.down_proj                      0.05:8b_32g/0.1:5b_128g/0.85:4b_128g s4            4.36 bpw
 -- model.layers.31.mlp.down_proj                      0.05:8b_32g/0.1:5b_32g/0.85:4b_32g s4              4.44 bpw
 -- model.layers.31.mlp.down_proj                      0.05:8b_32g/0.1:6b_128g/0.85:5b_128g s4            5.31 bpw
 -- model.layers.31.mlp.down_proj                      0.05:8b_32g/0.1:6b_32g/0.85:5b_32g s4              5.39 bpw
 -- model.layers.31.mlp.down_proj                      0.05:8b_32g/0.95:6b_128g s4                        6.15 bpw
 -- model.layers.31.mlp.down_proj                      0.15:8b_128g/0.85:6b_128g s4                       6.35 bpw
 -- model.layers.31.mlp.down_proj                      1:8b_128g s4                                       8.04 bpw
 -- 2.2469 bpw  accuracy: 0.93468168
 -- 2.3233 bpw  accuracy: 0.93676452
 -- 2.5957 bpw  accuracy: 0.94465024
 -- 2.9121 bpw  accuracy: 0.94718373
 -- 3.2851 bpw  accuracy: 0.96705803
 -- 3.3679 bpw  accuracy: 0.96966901
 -- 3.6207 bpw  accuracy: 0.97334990
 -- 4.1380 bpw  accuracy: 0.98255626
 -- 4.1991 bpw  accuracy: 0.98405144
 -- 4.2682 bpw  accuracy: 0.98309226
 -- 4.3510 bpw  accuracy: 0.98517615
 -- 5.2513 bpw  accuracy: 0.99132111
 -- 5.3341 bpw  accuracy: 0.99250382
 -- 6.0729 bpw  accuracy: 0.99510243
 -- 6.3082 bpw  accuracy: 0.99555561
 -- 6.8707 bpw  accuracy: 0.99634729
 -- 8.0374 bpw  accuracy: 0.99851187
------------------------------------------------
| Measured: model.layers.31 (MLP)              |
| Duration: 10.76 seconds                      |
| Completed step: 64/67                        |
| Avg time / step (rolling): 9.29 seconds      |
| Estimated remaining time: 0min 27sec         |
| Last checkpoint layer: model.layers.29 (MLP) |
------------------------------------------------
 -- Layer: model.norm (RMSNorm)
------------------------------------------------
| Measured: model.norm (RMSNorm)               |
| Duration: 0.26 seconds                       |
| Completed step: 65/67                        |
| Avg time / step (rolling): 8.52 seconds      |
| Estimated remaining time: 0min 17sec         |
| Last checkpoint layer: model.layers.29 (MLP) |
------------------------------------------------
 -- Layer: lm_head (Linear)
------------------------------------------------
| Measured: lm_head (Linear)                   |
| Duration: 0.34 seconds                       |
| Completed step: 66/67                        |
| Avg time / step (rolling): 7.51 seconds      |
| Estimated remaining time: 0min 7sec          |
| Last checkpoint layer: model.layers.29 (MLP) |
------------------------------------------------
 -- Saving checkpoint...
 -- Optimizing...
 -- Optimizing:    1/ 240
 -- Optimizing:    9/ 240
 -- Optimizing:   17/ 240
 -- Optimizing:   25/ 240
 -- Optimizing:   33/ 240
 -- Optimizing:   41/ 240
 -- Optimizing:   49/ 240
 -- Optimizing:   57/ 240
 -- Optimizing:   65/ 240
 -- Optimizing:   73/ 240
 -- Optimizing:   80/ 240
 -- Optimizing:   88/ 240
 -- Optimizing:   96/ 240
 -- Optimizing:  104/ 240
 -- Optimizing:  112/ 240
 -- Optimizing:  120/ 240
 -- Optimizing:  128/ 240
 -- Optimizing:  136/ 240
 -- Optimizing:  144/ 240
 -- Optimizing:  152/ 240
 -- Optimizing:  160/ 240
 -- Optimizing:  168/ 240
 -- Optimizing:  176/ 240
 -- Optimizing:  184/ 240
 -- Optimizing:  192/ 240
 -- Optimizing:  200/ 240
 -- Optimizing:  208/ 240
 -- Optimizing:  216/ 240
 -- Optimizing:  224/ 240
 -- Optimizing:  232/ 240
 -- Optimizing:  240/ 240
 -- max(err): 0.005406
 -- error_norm: 1.485759
 -- Quantization strategy:
 --   model.layers.0.self_attn                           6.6359 bpw - exp. error: 0.00218182
 --   model.layers.0.mlp                                 8.0374 bpw - exp. error: 0.00114895
 --   model.layers.1.self_attn                           8.0418 bpw - exp. error: 0.00184583
 --   model.layers.1.mlp                                 8.0374 bpw - exp. error: 0.00199654
 --   model.layers.2.self_attn                           8.0418 bpw - exp. error: 0.00177566
 --   model.layers.2.mlp                                 6.0729 bpw - exp. error: 0.00249584
 --   model.layers.3.self_attn                           4.1930 bpw - exp. error: 0.00383048
 --   model.layers.3.mlp                                 6.0729 bpw - exp. error: 0.00203851
 --   model.layers.4.self_attn                           6.6359 bpw - exp. error: 0.00102152
 --   model.layers.4.mlp                                 6.3082 bpw - exp. error: 0.00182404
 --   model.layers.5.self_attn                           4.4013 bpw - exp. error: 0.00264310
 --   model.layers.5.mlp                                 5.2513 bpw - exp. error: 0.00287902
 --   model.layers.6.self_attn                           4.4013 bpw - exp. error: 0.00337663
 --   model.layers.6.mlp                                 6.8707 bpw - exp. error: 0.00146585
 --   model.layers.7.self_attn                           6.6359 bpw - exp. error: 0.00094822
 --   model.layers.7.mlp                                 6.8707 bpw - exp. error: 0.00184917
 --   model.layers.8.self_attn                           6.6359 bpw - exp. error: 0.00114748
 --   model.layers.8.mlp                                 6.0729 bpw - exp. error: 0.00230076
 --   model.layers.9.self_attn                           6.6359 bpw - exp. error: 0.00127157
 --   model.layers.9.mlp                                 5.3341 bpw - exp. error: 0.00378097
 --   model.layers.10.self_attn                          6.6359 bpw - exp. error: 0.00155776
 --   model.layers.10.mlp                                6.3082 bpw - exp. error: 0.00244060
 --   model.layers.11.self_attn                          8.0418 bpw - exp. error: 0.00068859
 --   model.layers.11.mlp                                6.0729 bpw - exp. error: 0.00267253
 --   model.layers.12.self_attn                          6.6359 bpw - exp. error: 0.00177117
 --   model.layers.12.mlp                                6.8707 bpw - exp. error: 0.00214834
 --   model.layers.13.self_attn                          5.4640 bpw - exp. error: 0.00361148
 --   model.layers.13.mlp                                6.8707 bpw - exp. error: 0.00213348
 --   model.layers.14.self_attn                          6.0418 bpw - exp. error: 0.00148709
 --   model.layers.14.mlp                                6.0729 bpw - exp. error: 0.00155184
 --   model.layers.15.self_attn                          8.0418 bpw - exp. error: 0.00039677
 --   model.layers.15.mlp                                6.8707 bpw - exp. error: 0.00120598
 --   model.layers.16.self_attn                          6.6359 bpw - exp. error: 0.00103175
 --   model.layers.16.mlp                                6.3082 bpw - exp. error: 0.00161467
 --   model.layers.17.self_attn                          8.0418 bpw - exp. error: 0.00047822
 --   model.layers.17.mlp                                6.0729 bpw - exp. error: 0.00194863
 --   model.layers.18.self_attn                          6.0418 bpw - exp. error: 0.00202788
 --   model.layers.18.mlp                                5.2513 bpw - exp. error: 0.00404148
 --   model.layers.19.self_attn                          6.0418 bpw - exp. error: 0.00191705
 --   model.layers.19.mlp                                5.3341 bpw - exp. error: 0.00383573
 --   model.layers.20.self_attn                          6.6359 bpw - exp. error: 0.00128817
 --   model.layers.20.mlp                                5.3341 bpw - exp. error: 0.00428636
 --   model.layers.21.self_attn                          6.0418 bpw - exp. error: 0.00207416
 --   model.layers.21.mlp                                5.3341 bpw - exp. error: 0.00474077
 --   model.layers.22.self_attn                          6.0418 bpw - exp. error: 0.00207343
 --   model.layers.22.mlp                                6.3082 bpw - exp. error: 0.00300660
 --   model.layers.23.self_attn                          8.0418 bpw - exp. error: 0.00056060
 --   model.layers.23.mlp                                5.3341 bpw - exp. error: 0.00540571
 --   model.layers.24.self_attn                          6.6359 bpw - exp. error: 0.00141783
 --   model.layers.24.mlp                                6.0729 bpw - exp. error: 0.00354173
 --   model.layers.25.self_attn                          5.4640 bpw - exp. error: 0.00263537
 --   model.layers.25.mlp                                6.3082 bpw - exp. error: 0.00349990
 --   model.layers.26.self_attn                          6.6359 bpw - exp. error: 0.00133379
 --   model.layers.26.mlp                                8.0374 bpw - exp. error: 0.00102325
 --   model.layers.27.self_attn                          5.4640 bpw - exp. error: 0.00248246
 --   model.layers.27.mlp                                6.3082 bpw - exp. error: 0.00371280
 --   model.layers.28.self_attn                          6.0418 bpw - exp. error: 0.00244441
 --   model.layers.28.mlp                                8.0374 bpw - exp. error: 0.00109955
 --   model.layers.29.self_attn                          5.4640 bpw - exp. error: 0.00300564
 --   model.layers.29.mlp                                8.0374 bpw - exp. error: 0.00177070
 --   model.layers.30.self_attn                          6.6359 bpw - exp. error: 0.00173835
 --   model.layers.30.mlp                                8.0374 bpw - exp. error: 0.00135131
 --   model.layers.31.self_attn                          8.0418 bpw - exp. error: 0.00071250
 --   model.layers.31.mlp                                8.0374 bpw - exp. error: 0.00148813
 -- sum(log(err)): -402.140137
 -- max(err): 0.005406
 -- Tokenizing samples...
 -- Token embeddings again...
 -- Quantizing...
 -- Layer: model.layers.0 (Attention)
 -- Linear: model.layers.0.self_attn.q_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.0.self_attn.k_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.0.self_attn.v_proj -> 1:8b_32g s4, 8.14 bpw
 -- Linear: model.layers.0.self_attn.o_proj -> 1:6b_32g s4, 6.14 bpw
 -- Module quantized, rfn_error: 0.002210
 -- Layer: model.layers.0 (MLP)
 -- Linear: model.layers.0.mlp.gate_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.0.mlp.up_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.0.mlp.down_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.001247
 -- Layer: model.layers.1 (Attention)
 -- Linear: model.layers.1.self_attn.q_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.1.self_attn.k_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.1.self_attn.v_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.1.self_attn.o_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.001910
 -- Layer: model.layers.1 (MLP)
 -- Linear: model.layers.1.mlp.gate_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.1.mlp.up_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.1.mlp.down_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.002304
 -- Layer: model.layers.2 (Attention)
 -- Linear: model.layers.2.self_attn.q_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.2.self_attn.k_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.2.self_attn.v_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.2.self_attn.o_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.001826
 -- Layer: model.layers.2 (MLP)
 -- Linear: model.layers.2.mlp.gate_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.2.mlp.up_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.2.mlp.down_proj -> 0.05:8b_32g/0.95:6b_128g s4, 6.15 bpw
 -- Module quantized, rfn_error: 0.003184
 -- Layer: model.layers.3 (Attention)
 -- Linear: model.layers.3.self_attn.q_proj -> 0.1:5b_64g/0.9:4b_64g s4, 4.18 bpw
 -- Linear: model.layers.3.self_attn.k_proj -> 0.1:5b_64g/0.9:4b_64g s4, 4.18 bpw
 -- Linear: model.layers.3.self_attn.v_proj -> 0.1:5b_32g/0.9:4b_32g s4, 4.24 bpw
 -- Linear: model.layers.3.self_attn.o_proj -> 0.1:5b_64g/0.9:4b_64g s4, 4.18 bpw
 -- Module quantized, rfn_error: 0.004051
 -- Layer: model.layers.3 (MLP)
 -- Linear: model.layers.3.mlp.gate_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.3.mlp.up_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.3.mlp.down_proj -> 0.05:8b_32g/0.95:6b_128g s4, 6.15 bpw
 -- Module quantized, rfn_error: 0.002333
 -- Layer: model.layers.4 (Attention)
 -- Linear: model.layers.4.self_attn.q_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.4.self_attn.k_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.4.self_attn.v_proj -> 1:8b_32g s4, 8.14 bpw
 -- Linear: model.layers.4.self_attn.o_proj -> 1:6b_32g s4, 6.14 bpw
 -- Module quantized, rfn_error: 0.001081
 -- Layer: model.layers.4 (MLP)
 -- Linear: model.layers.4.mlp.gate_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.4.mlp.up_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.4.mlp.down_proj -> 0.15:8b_128g/0.85:6b_128g s4, 6.35 bpw
 -- Module quantized, rfn_error: 0.001737
 -- Layer: model.layers.5 (Attention)
 -- Linear: model.layers.5.self_attn.q_proj -> 0.1:5b_64g/0.9:4b_64g s4, 4.18 bpw
 -- Linear: model.layers.5.self_attn.k_proj -> 0.1:5b_64g/0.9:4b_64g s4, 4.18 bpw
 -- Linear: model.layers.5.self_attn.v_proj -> 1:5b_64g s4, 5.07 bpw
 -- Linear: model.layers.5.self_attn.o_proj -> 0.1:5b_64g/0.9:4b_64g s4, 4.18 bpw
 -- Module quantized, rfn_error: 0.002412
 -- Layer: model.layers.5 (MLP)
 -- Linear: model.layers.5.mlp.gate_proj -> 0.1:6b_128g/0.9:5b_128g s4, 5.16 bpw
 -- Linear: model.layers.5.mlp.up_proj -> 0.25:6b_128g/0.75:5b_128g s4, 5.29 bpw
 -- Linear: model.layers.5.mlp.down_proj -> 0.05:8b_32g/0.1:6b_128g/0.85:5b_128g s4, 5.31 bpw
 -- Module quantized, rfn_error: 0.002792
 -- Layer: model.layers.6 (Attention)
 -- Linear: model.layers.6.self_attn.q_proj -> 0.1:5b_64g/0.9:4b_64g s4, 4.18 bpw
 -- Linear: model.layers.6.self_attn.k_proj -> 0.1:5b_64g/0.9:4b_64g s4, 4.18 bpw
 -- Linear: model.layers.6.self_attn.v_proj -> 1:5b_64g s4, 5.07 bpw
 -- Linear: model.layers.6.self_attn.o_proj -> 0.1:5b_64g/0.9:4b_64g s4, 4.18 bpw
 -- Module quantized, rfn_error: 0.003026
 -- Layer: model.layers.6 (MLP)
 -- Linear: model.layers.6.mlp.gate_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.6.mlp.up_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.6.mlp.down_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.001388
 -- Layer: model.layers.7 (Attention)
 -- Linear: model.layers.7.self_attn.q_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.7.self_attn.k_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.7.self_attn.v_proj -> 1:8b_32g s4, 8.14 bpw
 -- Linear: model.layers.7.self_attn.o_proj -> 1:6b_32g s4, 6.14 bpw
 -- Module quantized, rfn_error: 0.000886
 -- Layer: model.layers.7 (MLP)
 -- Linear: model.layers.7.mlp.gate_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.7.mlp.up_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.7.mlp.down_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.001762
 -- Layer: model.layers.8 (Attention)
 -- Linear: model.layers.8.self_attn.q_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.8.self_attn.k_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.8.self_attn.v_proj -> 1:8b_32g s4, 8.14 bpw
 -- Linear: model.layers.8.self_attn.o_proj -> 1:6b_32g s4, 6.14 bpw
 -- Module quantized, rfn_error: 0.001070
 -- Layer: model.layers.8 (MLP)
 -- Linear: model.layers.8.mlp.gate_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.8.mlp.up_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.8.mlp.down_proj -> 0.05:8b_32g/0.95:6b_128g s4, 6.15 bpw
 -- Module quantized, rfn_error: 0.002282
 -- Layer: model.layers.9 (Attention)
 -- Linear: model.layers.9.self_attn.q_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.9.self_attn.k_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.9.self_attn.v_proj -> 1:8b_32g s4, 8.14 bpw
 -- Linear: model.layers.9.self_attn.o_proj -> 1:6b_32g s4, 6.14 bpw
 -- Module quantized, rfn_error: 0.001224
 -- Layer: model.layers.9 (MLP)
 -- Linear: model.layers.9.mlp.gate_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.23 bpw
 -- Linear: model.layers.9.mlp.up_proj -> 0.25:6b_32g/0.75:5b_32g s4, 5.38 bpw
 -- Linear: model.layers.9.mlp.down_proj -> 0.05:8b_32g/0.1:6b_32g/0.85:5b_32g s4, 5.39 bpw
 -- Module quantized, rfn_error: 0.003722
 -- Layer: model.layers.10 (Attention)
 -- Linear: model.layers.10.self_attn.q_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.10.self_attn.k_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.10.self_attn.v_proj -> 1:8b_32g s4, 8.14 bpw
 -- Linear: model.layers.10.self_attn.o_proj -> 1:6b_32g s4, 6.14 bpw
 -- Module quantized, rfn_error: 0.001441
 -- Layer: model.layers.10 (MLP)
 -- Linear: model.layers.10.mlp.gate_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.10.mlp.up_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.10.mlp.down_proj -> 0.15:8b_128g/0.85:6b_128g s4, 6.35 bpw
 -- Module quantized, rfn_error: 0.002382
 -- Layer: model.layers.11 (Attention)
 -- Linear: model.layers.11.self_attn.q_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.11.self_attn.k_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.11.self_attn.v_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.11.self_attn.o_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.000652
 -- Layer: model.layers.11 (MLP)
 -- Linear: model.layers.11.mlp.gate_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.11.mlp.up_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.11.mlp.down_proj -> 0.05:8b_32g/0.95:6b_128g s4, 6.15 bpw
 -- Module quantized, rfn_error: 0.002618
 -- Layer: model.layers.12 (Attention)
 -- Linear: model.layers.12.self_attn.q_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.12.self_attn.k_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.12.self_attn.v_proj -> 1:8b_32g s4, 8.14 bpw
 -- Linear: model.layers.12.self_attn.o_proj -> 1:6b_32g s4, 6.14 bpw
 -- Module quantized, rfn_error: 0.001683
 -- Layer: model.layers.12 (MLP)
 -- Linear: model.layers.12.mlp.gate_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.12.mlp.up_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.12.mlp.down_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.002034
 -- Layer: model.layers.13 (Attention)
 -- Linear: model.layers.13.self_attn.q_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.24 bpw
 -- Linear: model.layers.13.self_attn.k_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.24 bpw
 -- Linear: model.layers.13.self_attn.v_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.13.self_attn.o_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.24 bpw
 -- Module quantized, rfn_error: 0.003492
 -- Layer: model.layers.13 (MLP)
 -- Linear: model.layers.13.mlp.gate_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.13.mlp.up_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.13.mlp.down_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.001966
 -- Layer: model.layers.14 (Attention)
 -- Linear: model.layers.14.self_attn.q_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.14.self_attn.k_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.14.self_attn.v_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.14.self_attn.o_proj -> 1:6b_128g s4, 6.04 bpw
 -- Module quantized, rfn_error: 0.001318
 -- Layer: model.layers.14 (MLP)
 -- Linear: model.layers.14.mlp.gate_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.14.mlp.up_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.14.mlp.down_proj -> 0.05:8b_32g/0.95:6b_128g s4, 6.15 bpw
 -- Module quantized, rfn_error: 0.001441
 -- Layer: model.layers.15 (Attention)
 -- Linear: model.layers.15.self_attn.q_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.15.self_attn.k_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.15.self_attn.v_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.15.self_attn.o_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.000365
 -- Layer: model.layers.15 (MLP)
 -- Linear: model.layers.15.mlp.gate_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.15.mlp.up_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.15.mlp.down_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.001103
 -- Layer: model.layers.16 (Attention)
 -- Linear: model.layers.16.self_attn.q_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.16.self_attn.k_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.16.self_attn.v_proj -> 1:8b_32g s4, 8.14 bpw
 -- Linear: model.layers.16.self_attn.o_proj -> 1:6b_32g s4, 6.14 bpw
 -- Module quantized, rfn_error: 0.000938
 -- Layer: model.layers.16 (MLP)
 -- Linear: model.layers.16.mlp.gate_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.16.mlp.up_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.16.mlp.down_proj -> 0.15:8b_128g/0.85:6b_128g s4, 6.35 bpw
 -- Module quantized, rfn_error: 0.001508
 -- Layer: model.layers.17 (Attention)
 -- Linear: model.layers.17.self_attn.q_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.17.self_attn.k_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.17.self_attn.v_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.17.self_attn.o_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.000418
 -- Saving checkpoint...
 -- Layer: model.layers.17 (MLP)
 -- Linear: model.layers.17.mlp.gate_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.17.mlp.up_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.17.mlp.down_proj -> 0.05:8b_32g/0.95:6b_128g s4, 6.15 bpw
 -- Module quantized, rfn_error: 0.001869
 -- Layer: model.layers.18 (Attention)
 -- Linear: model.layers.18.self_attn.q_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.18.self_attn.k_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.18.self_attn.v_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.18.self_attn.o_proj -> 1:6b_128g s4, 6.04 bpw
 -- Module quantized, rfn_error: 0.001830
 -- Layer: model.layers.18 (MLP)
 -- Linear: model.layers.18.mlp.gate_proj -> 0.1:6b_128g/0.9:5b_128g s4, 5.16 bpw
 -- Linear: model.layers.18.mlp.up_proj -> 0.25:6b_128g/0.75:5b_128g s4, 5.29 bpw
 -- Linear: model.layers.18.mlp.down_proj -> 0.05:8b_32g/0.1:6b_128g/0.85:5b_128g s4, 5.31 bpw
 -- Module quantized, rfn_error: 0.003908
 -- Layer: model.layers.19 (Attention)
 -- Linear: model.layers.19.self_attn.q_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.19.self_attn.k_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.19.self_attn.v_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.19.self_attn.o_proj -> 1:6b_128g s4, 6.04 bpw
 -- Module quantized, rfn_error: 0.001757
 -- Layer: model.layers.19 (MLP)
 -- Linear: model.layers.19.mlp.gate_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.23 bpw
 -- Linear: model.layers.19.mlp.up_proj -> 0.25:6b_32g/0.75:5b_32g s4, 5.38 bpw
 -- Linear: model.layers.19.mlp.down_proj -> 0.05:8b_32g/0.1:6b_32g/0.85:5b_32g s4, 5.39 bpw
 -- Module quantized, rfn_error: 0.003729
 -- Layer: model.layers.20 (Attention)
 -- Linear: model.layers.20.self_attn.q_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.20.self_attn.k_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.20.self_attn.v_proj -> 1:8b_32g s4, 8.14 bpw
 -- Linear: model.layers.20.self_attn.o_proj -> 1:6b_32g s4, 6.14 bpw
 -- Module quantized, rfn_error: 0.001186
 -- Layer: model.layers.20 (MLP)
 -- Linear: model.layers.20.mlp.gate_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.23 bpw
 -- Linear: model.layers.20.mlp.up_proj -> 0.25:6b_32g/0.75:5b_32g s4, 5.38 bpw
 -- Linear: model.layers.20.mlp.down_proj -> 0.05:8b_32g/0.1:6b_32g/0.85:5b_32g s4, 5.39 bpw
 -- Module quantized, rfn_error: 0.004217
 -- Layer: model.layers.21 (Attention)
 -- Linear: model.layers.21.self_attn.q_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.21.self_attn.k_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.21.self_attn.v_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.21.self_attn.o_proj -> 1:6b_128g s4, 6.04 bpw
 -- Module quantized, rfn_error: 0.001915
 -- Layer: model.layers.21 (MLP)
 -- Linear: model.layers.21.mlp.gate_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.23 bpw
 -- Linear: model.layers.21.mlp.up_proj -> 0.25:6b_32g/0.75:5b_32g s4, 5.38 bpw
 -- Linear: model.layers.21.mlp.down_proj -> 0.05:8b_32g/0.1:6b_32g/0.85:5b_32g s4, 5.39 bpw
 -- Module quantized, rfn_error: 0.004769
 -- Layer: model.layers.22 (Attention)
 -- Linear: model.layers.22.self_attn.q_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.22.self_attn.k_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.22.self_attn.v_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.22.self_attn.o_proj -> 1:6b_128g s4, 6.04 bpw
 -- Module quantized, rfn_error: 0.002010
 -- Layer: model.layers.22 (MLP)
 -- Linear: model.layers.22.mlp.gate_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.22.mlp.up_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.22.mlp.down_proj -> 0.15:8b_128g/0.85:6b_128g s4, 6.35 bpw
 -- Module quantized, rfn_error: 0.003114
 -- Layer: model.layers.23 (Attention)
 -- Linear: model.layers.23.self_attn.q_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.23.self_attn.k_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.23.self_attn.v_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.23.self_attn.o_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.000544
 -- Layer: model.layers.23 (MLP)
 -- Linear: model.layers.23.mlp.gate_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.23 bpw
 -- Linear: model.layers.23.mlp.up_proj -> 0.25:6b_32g/0.75:5b_32g s4, 5.38 bpw
 -- Linear: model.layers.23.mlp.down_proj -> 0.05:8b_32g/0.1:6b_32g/0.85:5b_32g s4, 5.39 bpw
 -- Module quantized, rfn_error: 0.005750
 -- Layer: model.layers.24 (Attention)
 -- Linear: model.layers.24.self_attn.q_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.24.self_attn.k_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.24.self_attn.v_proj -> 1:8b_32g s4, 8.14 bpw
 -- Linear: model.layers.24.self_attn.o_proj -> 1:6b_32g s4, 6.14 bpw
 -- Module quantized, rfn_error: 0.001395
 -- Layer: model.layers.24 (MLP)
 -- Linear: model.layers.24.mlp.gate_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.24.mlp.up_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.24.mlp.down_proj -> 0.05:8b_32g/0.95:6b_128g s4, 6.15 bpw
 -- Module quantized, rfn_error: 0.003878
 -- Layer: model.layers.25 (Attention)
 -- Linear: model.layers.25.self_attn.q_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.24 bpw
 -- Linear: model.layers.25.self_attn.k_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.24 bpw
 -- Linear: model.layers.25.self_attn.v_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.25.self_attn.o_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.24 bpw
 -- Module quantized, rfn_error: 0.002646
 -- Layer: model.layers.25 (MLP)
 -- Linear: model.layers.25.mlp.gate_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.25.mlp.up_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.25.mlp.down_proj -> 0.15:8b_128g/0.85:6b_128g s4, 6.35 bpw
 -- Module quantized, rfn_error: 0.003885
 -- Layer: model.layers.26 (Attention)
 -- Linear: model.layers.26.self_attn.q_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.26.self_attn.k_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.26.self_attn.v_proj -> 1:8b_32g s4, 8.14 bpw
 -- Linear: model.layers.26.self_attn.o_proj -> 1:6b_32g s4, 6.14 bpw
 -- Module quantized, rfn_error: 0.001354
 -- Layer: model.layers.26 (MLP)
 -- Linear: model.layers.26.mlp.gate_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.26.mlp.up_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.26.mlp.down_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.001154
 -- Layer: model.layers.27 (Attention)
 -- Linear: model.layers.27.self_attn.q_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.24 bpw
 -- Linear: model.layers.27.self_attn.k_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.24 bpw
 -- Linear: model.layers.27.self_attn.v_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.27.self_attn.o_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.24 bpw
 -- Module quantized, rfn_error: 0.002578
 -- Layer: model.layers.27 (MLP)
 -- Linear: model.layers.27.mlp.gate_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.27.mlp.up_proj -> 0.1:8b_128g/0.9:6b_128g s4, 6.29 bpw
 -- Linear: model.layers.27.mlp.down_proj -> 0.15:8b_128g/0.85:6b_128g s4, 6.35 bpw
 -- Module quantized, rfn_error: 0.004201
 -- Layer: model.layers.28 (Attention)
 -- Linear: model.layers.28.self_attn.q_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.28.self_attn.k_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.28.self_attn.v_proj -> 1:6b_128g s4, 6.04 bpw
 -- Linear: model.layers.28.self_attn.o_proj -> 1:6b_128g s4, 6.04 bpw
 -- Module quantized, rfn_error: 0.002510
 -- Layer: model.layers.28 (MLP)
 -- Linear: model.layers.28.mlp.gate_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.28.mlp.up_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.28.mlp.down_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.001251
 -- Layer: model.layers.29 (Attention)
 -- Linear: model.layers.29.self_attn.q_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.24 bpw
 -- Linear: model.layers.29.self_attn.k_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.24 bpw
 -- Linear: model.layers.29.self_attn.v_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.29.self_attn.o_proj -> 0.1:6b_32g/0.9:5b_32g s4, 5.24 bpw
 -- Module quantized, rfn_error: 0.003163
 -- Layer: model.layers.29 (MLP)
 -- Linear: model.layers.29.mlp.gate_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.29.mlp.up_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.29.mlp.down_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.002406
 -- Layer: model.layers.30 (Attention)
 -- Linear: model.layers.30.self_attn.q_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.30.self_attn.k_proj -> 1:6b_32g s4, 6.14 bpw
 -- Linear: model.layers.30.self_attn.v_proj -> 1:8b_32g s4, 8.14 bpw
 -- Linear: model.layers.30.self_attn.o_proj -> 1:6b_32g s4, 6.14 bpw
 -- Module quantized, rfn_error: 0.001843
 -- Layer: model.layers.30 (MLP)
 -- Linear: model.layers.30.mlp.gate_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.30.mlp.up_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.30.mlp.down_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.001549
 -- Layer: model.layers.31 (Attention)
 -- Linear: model.layers.31.self_attn.q_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.31.self_attn.k_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.31.self_attn.v_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.31.self_attn.o_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.000743
 -- Layer: model.layers.31 (MLP)
 -- Linear: model.layers.31.mlp.gate_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.31.mlp.up_proj -> 1:8b_128g s4, 8.04 bpw
 -- Linear: model.layers.31.mlp.down_proj -> 1:8b_128g s4, 8.04 bpw
 -- Module quantized, rfn_error: 0.001628
 -- Layer: model.norm (RMSNorm)
 -- Module quantized, rfn_error: 0.000000
 -- Layer: lm_head (Linear)
 -- Linear: lm_head -> 0.15:8b_128g/0.85:6b_128g s4, 6.37 bpw
 -- Module quantized, calibration perplexity (quant): 9.5581
 -- Saving checkpoint...
 -- Compiling output file...
 -- Writing shard 1...
 -- Creating directory models--microsoft--Phi-3-mini-128k-instruct-exl2/6.5bpw/
 --   models--microsoft--Phi-3-mini-128k-instruct-exl2/6.5bpw/output.safetensors (3,068 MB)
 -- Copying non-tensor files to output directory models--microsoft--Phi-3-mini-128k-instruct-exl2/6.5bpw/
 --   .gitattributes
 --   added_tokens.json
 --   CODE_OF_CONDUCT.md
 --   config.json
 --   configuration_phi3.py
 --   generation_config.json
 --   LICENSE
 --   model.safetensors.index.json
 --   modeling_phi3.py
 --   NOTICE.md
 --   README.md
 --   sample_finetune.py
 --   SECURITY.md
 --   special_tokens_map.json
 --   tokenizer.json
 --   tokenizer.model
 --   tokenizer_config.json
 -- Finished
@turboderp
Copy link
Member

So I compared the two versions, and the only changes I can see are

  • they renamed the "su" scaling method to "longrope"
  • they removed the yarn implementation from the modeling_phi3.py

If you wouldn't mind, could you try just changing the name to "su" in the config? If that works I can just add an alias and it shouldn't need any other changes.

@SystemPanic
Copy link
Contributor Author

Thanks @turboderp, it works now.

I have submitted a new PR with the change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants