Replies: 1 comment 1 reply
-
From memory, there's a note in the newer I suspect that EDIT: Just to confirm you are aware that you have:
and
so you are guaranteed different results with your two calls to |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Edit:
After some investigation I've identified the problem.
When sampling,
top_k
value is not being evaluated before being passed into the functionhttps://github.com/abetlen/llama-cpp-python/blob/1a13d76c487df1c8560132d10bda62d6e2f4fa93/llama_cpp/llama.py#LL367C1-L367C1
The value is passed as is and is not changed to
n_vocab
iftop_k=0
.Why is that a problem?
In the source code of
llama.cpp
we can see that whenk=0
andmin_keep=1
it will always default to a maximum of a single candidate, ensuring we only receive the candidate with the highest logit.This is not an expected functionality, because value of
k=0
is meant to mark thattop_k
sampling is disabled, according tollama.cpp
source code:Hello.
I've noticed a strange occurrence when trying to generate output. Based on context the bindgins API will always return the same output. Additionally it seems that
top_p
andtemp
values are being completely ignored.This is not the case when running llama.cpp itself.
I am using the latest version (v0.1.50) of llama-cpp-python. I've installed it with cuBLAS support over pip as well as tried compiling it myself, both instances produce the same results.
My example script:
Output example (always the same, regardless of top_p and temp):
Now, using llama.cpp I always get a different result:
Sorry if this is an incorrect place to post something like this, this is my first time posting.
Beta Was this translation helpful? Give feedback.
All reactions