-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Working with long stories #307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Just to add, this is not a problem with llama.cpp itself. I can do very long conversations with llama.cpp in interactive mode. Also, I ran into this in a situation where the context size wasn't anywhere near 2048. It just plainly refused to generate more tokens. |
So it seems other people are reporting the issue via Ooba in #331. I attempted to reproduce directly in |
Having the same issue |
Describe exactly how this happened to you. |
Using a matrix bot thats hooked up to the oobabooga textgen using llama cpp python. It seems to start throwing the error after only a few messages |
Uh oh!
There was an error while loading. Please reload this page.
I'm trying to make long stories using a llama.cpp model (
guanaco-33B.ggmlv3.q4_0.bin
in my case) withoobabooga/text-generation-webui
.It works for short inputs but it stops working once the number of input tokens is coming close to the context size (2048).
With a bit of playing with the webui (you can count input tokens and modify the
max_new_tokens
on the main page) I found out that the behavior is like this:if nb_input_tokens + max_new_tokens < context_size , then it works correctly.
if nb_input_tokens < context_size but nb_input_tokens + max_new_tokens > context_size , then it fails silently, generating 0 tokens:
if
nb_input_tokens
>context_size
, then it fails with:I've seen issue #92 of llama-cpp-python but it is closed and I'm on a recent version of
llama-cpp-python
(release 0.1.57)llama-cpp-python
should probably discard some input tokens at the beginning to be able to fit inside the context and allow us to continue long stories.The text was updated successfully, but these errors were encountered: