persimmon : simplify rope logic #4302

Galunid · 2023-12-03T03:39:31Z

Make it similar to stablelm now that gpt-neox is supported for all layers.
I tried to allow for full offloading, but tmpqkv are on CPU and not on GPU.

Galunid · 2023-12-03T03:40:02Z

This is possible thanks to #4156

ggerganov · 2023-12-03T08:40:16Z

You can offload tmpqkv by adding an entry in the k_offload_map:

https://github.com/ggerganov/llama.cpp/blob/3cb1c348b3481f39d244b6b0cd5afb04aa6bd460/llama.cpp#L5262-L5271

Galunid added 3 commits November 28, 2023 22:08

persimmon : use rope over whole Qcur/Kcur

3e28686

Merge branch 'master' into speedup-persimmon

5615953

Use ggml_reshape_3d

28a64da

Galunid marked this pull request as draft December 4, 2023 03:17

Galunid closed this May 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

persimmon : simplify rope logic #4302

persimmon : simplify rope logic #4302

Uh oh!

Galunid commented Dec 3, 2023

Uh oh!

Galunid commented Dec 3, 2023

Uh oh!

ggerganov commented Dec 3, 2023

Uh oh!

Uh oh!

persimmon : simplify rope logic #4302

persimmon : simplify rope logic #4302

Uh oh!

Conversation

Galunid commented Dec 3, 2023

Uh oh!

Galunid commented Dec 3, 2023

Uh oh!

ggerganov commented Dec 3, 2023

Uh oh!

Uh oh!