-
Notifications
You must be signed in to change notification settings - Fork 11.7k
Error - not enough space in the context's memory pool #2404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@omarelanis I cannot reproduce this with |
@slaren firstly thank you so much for the quick response! I've followed your advise and tested with a build from the main repo, and running from the command line it does load everything correctly including loading the GPU support: `PS H:\AI_Projects\llamaCppCudaBuild\llama.cpp\build\bin\Release> .\main.exe -m H:\AI_Projects\Indexer_Plus_GPT\models\llama7b\llama-deus-7b-v3.ggmlv3.q4_0.bin -n -1 --color -r "User:" --in-prefix " " -e --prompt "User: Hi\nAI: Hello. I am an AI chatbot. Would you like to talk?\nUser: Sure!\nAI: What would you like to talk about?\nUser: how far is the sun?" system_info: n_threads = 10 / 20 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | User: Hi llama_print_timings: load time = 680.95 ms However, when install via pip using command For the most part the langchain code is just running this: from langchain.llms import LlamaCpp Could you point in which direction is likely causing this error, langchain related? |
You could look into what parameters are being passed to |
I think I've found the issue, the workaround in the link I provided before (abetlen/llama-cpp-python#182) is using the latest version 0.1.77 of llama_cpp_python which is causing the issue, reverting back to 0.1.68 fixes the issue but stops the BLAS Cuda support for the GPU. Thank you for your help so far, much appreciated. |
Expected Behavior
Type in a question and answer is retrieved from LLM model
Current Behavior
Instantly receive the following error:
ggml_new_object: not enough space in the context's memory pool (needed 10882896, available 10650320)
Environment and Context
Tried a combination of settings but just keep getting the memory error even though both RAM and GPU RAM are less than 50% utilization.
I had to follow the guide here to build llama-cpp with GPU support as it wasn't working previously, but even before that it was giving the same error (side note GPU support natively does work in oobabooga windows!?):
abetlen/llama-cpp-python#182
HW:
Windows 11
Intel i9-10900K OC @5.3GHz
64GB DDR4-2400 / PC4-19200
12GB Nvidia GeForce RTX 3060
Python 3.10.0
Using embedded DuckDB with persistence: data will be stored in: db
ggml_init_cublas: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6
llama.cpp: loading model from models/llama7b/llama-deus-7b-v3.ggmlv3.q4_0.bin
llama_model_load_internal: format = ggjt v3 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 2048
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_head_kv = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: n_gqa = 1
llama_model_load_internal: rnorm_eps = 5.0e-06
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: freq_base = 10000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype = 2 (mostly Q4_0)
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 0.08 MB
llama_model_load_internal: using CUDA for GPU acceleration
llama_model_load_internal: mem required = 2927.79 MB (+ 1024.00 MB per state)
llama_model_load_internal: allocating batch_size x (512 kB + n_ctx x 128 B) = 384 MB VRAM for the scratch buffer
llama_model_load_internal: offloading 10 repeating layers to GPU
llama_model_load_internal: offloaded 10/35 layers to GPU
llama_model_load_internal: total VRAM used: 1470 MB
llama_new_context_with_model: kv self size = 1024.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
What would you like to know about the policies?
test
ggml_new_object: not enough space in the context's memory pool (needed 10882896, available 10650320)
Traceback (most recent call last):
File "H:\AI_Projects\Indexer_Plus_GPT\chat.py", line 84, in
main()
File "H:\AI_Projects\Indexer_Plus_GPT\chat.py", line 55, in main
res = qa(query)
File "C:\Program Files\Python310\lib\site-packages\langchain\chains\base.py", line 243, in call
raise e
File "C:\Program Files\Python310\lib\site-packages\langchain\chains\base.py", line 237, in call
self._call(inputs, run_manager=run_manager)
File "C:\Program Files\Python310\lib\site-packages\langchain\chains\retrieval_qa\base.py", line 133, in _call
answer = self.combine_documents_chain.run(
File "C:\Program Files\Python310\lib\site-packages\langchain\chains\base.py", line 441, in run
return self(kwargs, callbacks=callbacks, tags=tags, metadata=metadata)[
File "C:\Program Files\Python310\lib\site-packages\langchain\chains\base.py", line 243, in call
raise e
File "C:\Program Files\Python310\lib\site-packages\langchain\chains\base.py", line 237, in call
self._call(inputs, run_manager=run_manager)
File "C:\Program Files\Python310\lib\site-packages\langchain\chains\combine_documents\base.py", line 106, in _call
output, extra_return_dict = self.combine_docs(
File "C:\Program Files\Python310\lib\site-packages\langchain\chains\combine_documents\stuff.py", line 165, in combine_docs
return self.llm_chain.predict(callbacks=callbacks, **inputs), {}
File "C:\Program Files\Python310\lib\site-packages\langchain\chains\llm.py", line 252, in predict
return self(kwargs, callbacks=callbacks)[self.output_key]
File "C:\Program Files\Python310\lib\site-packages\langchain\chains\base.py", line 243, in call
raise e
File "C:\Program Files\Python310\lib\site-packages\langchain\chains\base.py", line 237, in call
self._call(inputs, run_manager=run_manager)
File "C:\Program Files\Python310\lib\site-packages\langchain\chains\llm.py", line 92, in _call
response = self.generate([inputs], run_manager=run_manager)
File "C:\Program Files\Python310\lib\site-packages\langchain\chains\llm.py", line 102, in generate
return self.llm.generate_prompt(
File "C:\Program Files\Python310\lib\site-packages\langchain\llms\base.py", line 188, in generate_prompt
return self.generate(prompt_strings, stop=stop, callbacks=callbacks, **kwargs)
File "C:\Program Files\Python310\lib\site-packages\langchain\llms\base.py", line 281, in generate
output = self._generate_helper(
File "C:\Program Files\Python310\lib\site-packages\langchain\llms\base.py", line 225, in _generate_helper
raise e
File "C:\Program Files\Python310\lib\site-packages\langchain\llms\base.py", line 212, in _generate_helper
self._generate(
File "C:\Program Files\Python310\lib\site-packages\langchain\llms\base.py", line 604, in _generate
self._call(prompt, stop=stop, run_manager=run_manager, **kwargs)
File "C:\Program Files\Python310\lib\site-packages\langchain\llms\llamacpp.py", line 229, in _call
for token in self.stream(prompt=prompt, stop=stop, run_manager=run_manager):
File "C:\Program Files\Python310\lib\site-packages\langchain\llms\llamacpp.py", line 279, in stream
for chunk in result:
File "C:\Program Files\Python310\lib\site-packages\llama_cpp\llama.py", line 899, in _create_completion
for token in self.generate(
File "C:\Program Files\Python310\lib\site-packages\llama_cpp\llama.py", line 721, in generate
self.eval(tokens)
File "C:\Program Files\Python310\lib\site-packages\llama_cpp\llama.py", line 461, in eval
return_code = llama_cpp.llama_eval(
File "C:\Program Files\Python310\lib\site-packages\llama_cpp\llama_cpp.py", line 678, in llama_eval
return _lib.llama_eval(ctx, tokens, n_tokens, n_past, n_threads)
OSError: exception: access violation reading 0x0000000000000000
The text was updated successfully, but these errors were encountered: