Skip to content

Working with long stories #307

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
leszekhanusz opened this issue Jun 1, 2023 · 5 comments
Open

Working with long stories #307

leszekhanusz opened this issue Jun 1, 2023 · 5 comments
Labels
bug Something isn't working duplicate This issue or pull request already exists oobabooga https://github.com/oobabooga/text-generation-webui

Comments

@leszekhanusz
Copy link

leszekhanusz commented Jun 1, 2023

I'm trying to make long stories using a llama.cpp model (guanaco-33B.ggmlv3.q4_0.bin in my case) with oobabooga/text-generation-webui.

It works for short inputs but it stops working once the number of input tokens is coming close to the context size (2048).

With a bit of playing with the webui (you can count input tokens and modify the max_new_tokens on the main page) I found out that the behavior is like this:

if nb_input_tokens + max_new_tokens < context_size , then it works correctly.
if nb_input_tokens < context_size but nb_input_tokens + max_new_tokens > context_size , then it fails silently, generating 0 tokens:

Output generated in 0.25 seconds (0.00 tokens/s, 0 tokens, ...

if nb_input_tokens > context_size, then it fails with:

llama_tokenize: too many tokens
llama_tokenize: too many tokens
llama_tokenize: too many tokens
Output generated in 0.28 seconds (0.00 tokens/s, 0 tokens, ...

I've seen issue #92 of llama-cpp-python but it is closed and I'm on a recent version of llama-cpp-python (release 0.1.57)

llama-cpp-python should probably discard some input tokens at the beginning to be able to fit inside the context and allow us to continue long stories.

@gjmulder gjmulder added the quality Quality of model output label Jun 2, 2023
@agronholm
Copy link

Just to add, this is not a problem with llama.cpp itself. I can do very long conversations with llama.cpp in interactive mode. Also, I ran into this in a situation where the context size wasn't anywhere near 2048. It just plainly refused to generate more tokens.

@gjmulder gjmulder added the duplicate This issue or pull request already exists label Jun 8, 2023
@gjmulder
Copy link
Contributor

gjmulder commented Jun 8, 2023

So it seems other people are reporting the issue via Ooba in #331. I attempted to reproduce directly in llama-cpp-python, but couldn't.

@gjmulder gjmulder added oobabooga https://github.com/oobabooga/text-generation-webui bug Something isn't working and removed quality Quality of model output labels Jun 9, 2023
@dillfrescott
Copy link

Having the same issue

@agronholm
Copy link

Having the same issue

Describe exactly how this happened to you.

@dillfrescott
Copy link

Using a matrix bot thats hooked up to the oobabooga textgen using llama cpp python. It seems to start throwing the error after only a few messages

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working duplicate This issue or pull request already exists oobabooga https://github.com/oobabooga/text-generation-webui
Projects
None yet
Development

No branches or pull requests

4 participants