Skip to content

llama_tokenize: too many tokens #92

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ylchin opened this issue Apr 18, 2023 · 2 comments
Closed

llama_tokenize: too many tokens #92

ylchin opened this issue Apr 18, 2023 · 2 comments
Labels
model Model specific issue quality Quality of model output

Comments

@ylchin
Copy link

ylchin commented Apr 18, 2023

I was trying to run an alpaca model on a framework with a relatively large context window, and the following message keeps popping up:
llama_tokenize: too many tokens
how could i bypass this, and what is the maximum number of tokens in this case?

@gjmulder
Copy link
Contributor

I believe this may be a limit in llama.cpp. There was a discussion about having to increase some memory buffers to support larger context sizes.

@gjmulder gjmulder added model Model specific issue quality Quality of model output labels May 12, 2023
@abetlen
Copy link
Owner

abetlen commented May 12, 2023

@gjmulder @ylchin fixed, you can now tokenize longer prompts, while this doesn't help with processing longer contexts it should help with chunking conversations / documents embeddings.

carmonajca added a commit to carmonajca/llama-cpp-python that referenced this issue May 17, 2023
* Bugfix: Ensure logs are printed when streaming

* Update llama.cpp

* Update llama.cpp

* Add missing tfs_z paramter

* Bump version

* Fix docker command

* Revert "llama_cpp server: prompt is a string". Closes abetlen#187

This reverts commit b9098b0.

* Only support generating one prompt at a time.

* Allow model to tokenize strings longer than context length and set add_bos. Closes abetlen#92

* Update llama.cpp

* Bump version

* Update llama.cpp

* Fix obscure Wndows DLL issue. Closes abetlen#208

* chore: add note for Mac m1 installation

* Add winmode arg only on windows if python version supports it

* Bump mkdocs-material from 9.1.11 to 9.1.12

Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.1.11 to 9.1.12.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](squidfunk/mkdocs-material@9.1.11...9.1.12)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>

* Update README.md

Fix typo.

* Fix CMakeLists.txt

* Add sampling defaults for generate

* Update llama.cpp

* Add model_alias option to override model_path in completions. Closes abetlen#39

* Update variable name

* Update llama.cpp

* Fix top_k value. Closes abetlen#220

* Fix last_n_tokens_size

* Implement penalize_nl

* Format

* Update token checks

* Move docs link up

* Fixd CUBLAS dll load issue in Windows

* Check for CUDA_PATH before adding

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Andrei Betlen <[email protected]>
Co-authored-by: Anchen <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Xiyou Zhou <[email protected]>
Co-authored-by: Aneesh Joy <[email protected]>
xaptronic pushed a commit to xaptronic/llama-cpp-python that referenced this issue Jun 13, 2023
* Add quantize script for batch quantization

* Indentation

* README for new quantize.sh

* Fix script name

* Fix file list on Mac OS

---------

Co-authored-by: Georgi Gerganov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
model Model specific issue quality Quality of model output
Projects
None yet
Development

No branches or pull requests

3 participants