ggml_new_tensor_impl: not enough space in the context's memory pool #29

NotNite · 2023-03-12T01:51:07Z

Heya! Friend showed this to me and I'm trying to get it to work myself on Windows 10. I've applied the changes as seen in #22 to get it to build (more specifically, I pulled in the new commits from etra0's fork, but the actual executable fails to run - printing this before segfaulting:

ggml_new_tensor_impl: not enough space in the context's memory pool (needed 458853944, available 454395136)
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 458870468, available 454395136)

I'm trying to use 7B on an i9-13900K (and I have about 30 gigs of memory free right now), and I've verified my hashes with a friend. Any ideas? Thanks!

The text was updated successfully, but these errors were encountered:

NotNite · 2023-03-12T04:38:11Z

Tried out #31 - it, uh, got farther: GGML_ASSERT: D:\code\c++\llama.cpp\ggml.c:9349: false

etra0 · 2023-03-12T05:14:44Z

ok I did an upsie in that PR, initializing it that way apparently didn't zero'ed out the rest of the fields. I updated the branch, please test it again now!

NotNite · 2023-03-12T06:32:45Z

ok I did an upsie in that PR, initializing it that way apparently didn't zero'ed out the rest of the fields. I updated the branch, please test it again now!

It started to expand the prompt, but with seemingly garbage data: Building a website can be done in 10 simple steps: ╨Ñ╤Ç╨╛╨╜╨╛╨╗╨╛╨│╨╕╤ÿ╨

ggerganov · 2023-03-13T17:23:15Z

Should be good on latest master - reopen if issue persists.
Make sure to rebuild and regen the models after updating

eshaanagarwal · 2023-06-12T10:59:10Z

Hey i was trying to run this on a RHEL 8 server with 32 cpu cores. and i am getting the same error. On my second query.

I am using GPT4All-J v1.3-groovy.

ggml_new_tensor_impl: not enough space in the context's memory pool

eshaanagarwal · 2023-06-13T07:59:21Z

Hi @ggerganov @gjmulder I would appreciate some direction for this pls.

superbsky · 2023-06-18T01:33:27Z

Getting the same issue on Apple M1 Pro with 16GB RAM when trying the example from:

https://github.com/curiousily/Get-Things-Done-with-Prompt-Engineering-and-LangChain/blob/master/06.private-gpt4all-qa-pdf.ipynb

Using a relatively large PDF with ~200 pages

Stack trace:

gpt_tokenize: unknown token '?'
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 16118890208, available 16072355200)
[1] 62734 segmentation fault python3
/opt/homebrew/Cellar/[email protected]/3.11.4/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '

dzupin · 2023-07-18T13:50:07Z

Same issue when running on Win11 with 64GB RAM (25 GB utilized):

ggml_new_tensor_impl: not enough space in the scratch memory pool (needed 450887680, available 446693376)
Traceback (most recent call last):
File "C:\AI\oobabooga_windows_GPU\text-generation-webui\modules\callbacks.py", line 55, in gentask
ret = self.mfunc(callback=_callback, *args, **self.kwargs)
File "C:\AI\oobabooga_windows_GPU\text-generation-webui\modules\llamacpp_model.py", line 92, in generate
for completion_chunk in completion_chunks:
File "C:\AI\oobabooga_windows_GPU\installer_files\env\lib\site-packages\llama_cpp\llama.py", line 891, in _create_completion
for token in self.generate(
File "C:\AI\oobabooga_windows_GPU\installer_files\env\lib\site-packages\llama_cpp\llama.py", line 713, in generate
self.eval(tokens)
File "C:\AI\oobabooga_windows_GPU\installer_files\env\lib\site-packages\llama_cpp\llama.py", line 453, in eval
return_code = llama_cpp.llama_eval(
File "C:\AI\oobabooga_windows_GPU\installer_files\env\lib\site-packages\llama_cpp\llama_cpp.py", line 612, in llama_eval
return _lib.llama_eval(ctx, tokens, n_tokens, n_past, n_threads)
OSError: exception: access violation reading 0x0000000000000028
Output generated in 39.00 seconds (0.00 tokens/s, 0 tokens, context 5200, seed 1177762893)

LoganDark · 2023-07-25T01:06:58Z

Same issue when running on Win11 with 64GB RAM (25 GB utilized): [snip]

Oh hey, exact same error:

ggml_new_tensor_impl: not enough space in the scratch memory pool (needed 452859040, available 446693376)

omarelanis · 2023-07-26T15:09:52Z

Same issue here, tried a combination of settings but just keep getting the memory error even though both RAM and GPU RAM are less than 50% utilization.

I had to follow the guide here to build llama-cpp with GPU support as it wasn't working previously, but even before that it was giving the same error (side note GPU support natively does work in oobabooga windows!?):
abetlen/llama-cpp-python#182

Anyone have any ideas?

HW:
Intel i9-10900K OC @5.3GHz
64GB DDR4-2400 / PC4-19200
12GB Nvidia GeForce RTX 3060

Using embedded DuckDB with persistence: data will be stored in: db
ggml_init_cublas: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6
llama.cpp: loading model from models/llama7b/llama-deus-7b-v3.ggmlv3.q4_0.bin
llama_model_load_internal: format = ggjt v3 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 2048
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_head_kv = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: n_gqa = 1
llama_model_load_internal: rnorm_eps = 5.0e-06
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: freq_base = 10000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype = 2 (mostly Q4_0)
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 0.08 MB
llama_model_load_internal: using CUDA for GPU acceleration
llama_model_load_internal: mem required = 2927.79 MB (+ 1024.00 MB per state)
llama_model_load_internal: allocating batch_size x (512 kB + n_ctx x 128 B) = 384 MB VRAM for the scratch buffer
llama_model_load_internal: offloading 10 repeating layers to GPU
llama_model_load_internal: offloaded 10/35 layers to GPU
llama_model_load_internal: total VRAM used: 1470 MB
llama_new_context_with_model: kv self size = 1024.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |

What would you like to know about the policies?

test

ggml_new_object: not enough space in the context's memory pool (needed 10882896, available 10650320)
Traceback (most recent call last):
File "H:\AI_Projects\Indexer_Plus_GPT\chat.py", line 84, in
main()
File "H:\AI_Projects\Indexer_Plus_GPT\chat.py", line 55, in main
res = qa(query)
File "C:\Program Files\Python310\lib\site-packages\langchain\chains\base.py", line 243, in call
raise e
File "C:\Program Files\Python310\lib\site-packages\langchain\chains\base.py", line 237, in call
self._call(inputs, run_manager=run_manager)
File "C:\Program Files\Python310\lib\site-packages\langchain\chains\retrieval_qa\base.py", line 133, in _call
answer = self.combine_documents_chain.run(
File "C:\Program Files\Python310\lib\site-packages\langchain\chains\base.py", line 441, in run
return self(kwargs, callbacks=callbacks, tags=tags, metadata=metadata)[
File "C:\Program Files\Python310\lib\site-packages\langchain\chains\base.py", line 243, in call
raise e
File "C:\Program Files\Python310\lib\site-packages\langchain\chains\base.py", line 237, in call
self._call(inputs, run_manager=run_manager)
File "C:\Program Files\Python310\lib\site-packages\langchain\chains\combine_documents\base.py", line 106, in _call
output, extra_return_dict = self.combine_docs(
File "C:\Program Files\Python310\lib\site-packages\langchain\chains\combine_documents\stuff.py", line 165, in combine_docs
return self.llm_chain.predict(callbacks=callbacks, **inputs), {}
File "C:\Program Files\Python310\lib\site-packages\langchain\chains\llm.py", line 252, in predict
return self(kwargs, callbacks=callbacks)[self.output_key]
File "C:\Program Files\Python310\lib\site-packages\langchain\chains\base.py", line 243, in call
raise e
File "C:\Program Files\Python310\lib\site-packages\langchain\chains\base.py", line 237, in call
self._call(inputs, run_manager=run_manager)
File "C:\Program Files\Python310\lib\site-packages\langchain\chains\llm.py", line 92, in _call
response = self.generate([inputs], run_manager=run_manager)
File "C:\Program Files\Python310\lib\site-packages\langchain\chains\llm.py", line 102, in generate
return self.llm.generate_prompt(
File "C:\Program Files\Python310\lib\site-packages\langchain\llms\base.py", line 188, in generate_prompt
return self.generate(prompt_strings, stop=stop, callbacks=callbacks, **kwargs)
File "C:\Program Files\Python310\lib\site-packages\langchain\llms\base.py", line 281, in generate
output = self._generate_helper(
File "C:\Program Files\Python310\lib\site-packages\langchain\llms\base.py", line 225, in _generate_helper
raise e
File "C:\Program Files\Python310\lib\site-packages\langchain\llms\base.py", line 212, in _generate_helper
self._generate(
File "C:\Program Files\Python310\lib\site-packages\langchain\llms\base.py", line 604, in _generate
self._call(prompt, stop=stop, run_manager=run_manager, **kwargs)
File "C:\Program Files\Python310\lib\site-packages\langchain\llms\llamacpp.py", line 229, in _call
for token in self.stream(prompt=prompt, stop=stop, run_manager=run_manager):
File "C:\Program Files\Python310\lib\site-packages\langchain\llms\llamacpp.py", line 279, in stream
for chunk in result:
File "C:\Program Files\Python310\lib\site-packages\llama_cpp\llama.py", line 899, in _create_completion
for token in self.generate(
File "C:\Program Files\Python310\lib\site-packages\llama_cpp\llama.py", line 721, in generate
self.eval(tokens)
File "C:\Program Files\Python310\lib\site-packages\llama_cpp\llama.py", line 461, in eval
return_code = llama_cpp.llama_eval(
File "C:\Program Files\Python310\lib\site-packages\llama_cpp\llama_cpp.py", line 678, in llama_eval
return _lib.llama_eval(ctx, tokens, n_tokens, n_past, n_threads)
OSError: exception: access violation reading 0x0000000000000000

jiapei100 · 2023-08-07T21:13:59Z

Same here... any solutions already???

sherrmann · 2023-09-02T22:25:14Z

Solved this by going back to llama-cpp-python version 0.1.74

LoganDark · 2023-09-02T22:25:46Z

Solved this by going back to llama-cpp-python version 0.1.74

well this has nothing to do with python

dereklll · 2023-09-12T05:47:38Z

Same here... any solutions already???

sozforex · 2023-09-12T12:47:13Z

@dereklll This issue was closed 6 months ago, I'd suggest to create a new one.

dillfrescott · 2023-11-15T05:40:18Z

Same issue on a runpod gpu machine, tried 2 different gpu's

Fixes and Tweaks to Defaults

ggerganov closed this as completed Mar 13, 2023

gjmulder added the wontfix This will not be worked on label Mar 15, 2023

bluecoconut mentioned this issue May 16, 2023

starcoder -- not enough space in the context's memory pool ggml-org/ggml#158

Closed

valkryhx mentioned this issue Oct 8, 2023

max_context_length > 2048 (比如langchain 场景下很长的上下文)时报错: ggml_new_tensor_impl: not enough space in the scratch memory pool li-plus/chatglm.cpp#136

Closed

Deadsg pushed a commit to Deadsg/llama.cpp that referenced this issue Dec 19, 2023

Merge pull request ggml-org#29 from MillionthOdin16/main

c2e690b

Fixes and Tweaks to Defaults

Bearsaerker mentioned this issue Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml_new_tensor_impl: not enough space in the context's memory pool #29

ggml_new_tensor_impl: not enough space in the context's memory pool #29

NotNite commented Mar 12, 2023

NotNite commented Mar 12, 2023

etra0 commented Mar 12, 2023

NotNite commented Mar 12, 2023

ggerganov commented Mar 13, 2023

eshaanagarwal commented Jun 12, 2023

eshaanagarwal commented Jun 13, 2023

superbsky commented Jun 18, 2023

dzupin commented Jul 18, 2023

LoganDark commented Jul 25, 2023

omarelanis commented Jul 26, 2023

jiapei100 commented Aug 7, 2023

sherrmann commented Sep 2, 2023

LoganDark commented Sep 2, 2023

dereklll commented Sep 12, 2023

sozforex commented Sep 12, 2023

dillfrescott commented Nov 15, 2023

ggml_new_tensor_impl: not enough space in the context's memory pool #29

ggml_new_tensor_impl: not enough space in the context's memory pool #29

Comments

NotNite commented Mar 12, 2023

NotNite commented Mar 12, 2023

etra0 commented Mar 12, 2023

NotNite commented Mar 12, 2023

ggerganov commented Mar 13, 2023

eshaanagarwal commented Jun 12, 2023

eshaanagarwal commented Jun 13, 2023

superbsky commented Jun 18, 2023

dzupin commented Jul 18, 2023

LoganDark commented Jul 25, 2023

omarelanis commented Jul 26, 2023

jiapei100 commented Aug 7, 2023

sherrmann commented Sep 2, 2023

LoganDark commented Sep 2, 2023

dereklll commented Sep 12, 2023

sozforex commented Sep 12, 2023

dillfrescott commented Nov 15, 2023