Llama.cpp portable fails to initialise with context sizes above 22528 (24 x 1024).

**Describe the bug**
The portable nightly build of Llama.cpp fails to initialise when setting context size above 22528. This is using -sm layer across 3 Arc GPUs. There is more than enough VRAM available.

**How to reproduce**
Steps to reproduce the error:
1. Download a copy of Qwen3-30B-A3B-Q4_K_L.gguf.
2. Download the latest nightly build of Llama.cpp from the releases section.
3. Start the server with:
`ONEAPI_DEVICE_SELECTOR=level_zero:0,1,2 ZES_ENABLE_SYSMAN=1 SYCL_CACHE_PERSISTENT=1 SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ./llama-server -c 22528 -ngl 999 -m /home/llm/models/Qwen_Qwen3-30B-A3B-Q4_K_L.gguf --host 0.0.0.0 --port 8001 -sm layer --jinja`
5. If the context is set above 22628, the engine crashes with the following error:
`llama_kv_cache_init: kv_size = 23552, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 48, can_shift = 1
llama_kv_cache_init:      SYCL0 KV buffer size =   782.00 MiB
llama_kv_cache_init:      SYCL1 KV buffer size =   736.00 MiB
llama_kv_cache_init:      SYCL2 KV buffer size =   690.00 MiB
llama_init_from_model: KV self size  = 2208.00 MiB, K (f16): 1104.00 MiB, V (f16): 1104.00 MiB
llama_init_from_model:  SYCL_Host  output buffer size =     0.58 MiB
llama_init_from_model: pipeline parallelism enabled (n_copies=4)
ggml_backend_sycl_buffer_type_alloc_buffer: can't allocate 4334944256 Bytes of memory on device
ggml_gallocr_reserve_n: failed to allocate SYCL2 buffer of size 4334944256
ggml_backend_sycl_buffer_type_alloc_buffer: can't allocate 6065967104 Bytes of memory on device
ggml_gallocr_reserve_n: failed to allocate SYCL0 buffer of size 6065967104
ggml_backend_sycl_buffer_type_alloc_buffer: can't allocate 5626382848 Bytes of memory on device
ggml_gallocr_reserve_n: failed to allocate SYCL1 buffer of size 5626382848
llama_init_from_model: failed to allocate compute buffers
common_init_from_params: failed to create context with model '/home/llm/models/Qwen_Qwen3-30B-A3B-Q4_K_L.gguf'
terminate called without an active exception
./llama-server: line 2: 142366 Aborted                 (core dumped) LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$(cd "$(dirname "$0")";pwd) $(cd "$(dirname "$0")";pwd)/llama-server-bin "$@"`

By comparison, setting the context to or below 22528, the following log for KV cache information is generated:
`llama_kv_cache_init: kv_size = 22528, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 48, can_shift = 1
llama_kv_cache_init:      SYCL0 KV buffer size =   748.00 MiB
llama_kv_cache_init:      SYCL1 KV buffer size =   704.00 MiB
llama_kv_cache_init:      SYCL2 KV buffer size =   660.00 MiB
llama_init_from_model: KV self size  = 2112.00 MiB, K (f16): 1056.00 MiB, V (f16): 1056.00 MiB
llama_init_from_model:  SYCL_Host  output buffer size =     0.58 MiB
llama_init_from_model: pipeline parallelism enabled (n_copies=4)
llama_init_from_model:      SYCL0 compute buffer size =  2016.06 MiB
llama_init_from_model:      SYCL1 compute buffer size =  2016.06 MiB
llama_init_from_model:      SYCL2 compute buffer size =  4070.12 MiB
llama_init_from_model:  SYCL_Host compute buffer size =  1440.19 MiB
llama_init_from_model: graph nodes  = 3270 (with bs=4096), 2646 (with bs=1)
llama_init_from_model: graph splits = 4`

As you can see, the difference in compute buffer size is hardly an issue when comparing 22528 context to 24576, however it still fails to initialise.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Llama.cpp portable fails to initialise with context sizes above 22528 (24 x 1024). #13130

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Llama.cpp portable fails to initialise with context sizes above 22528 (24 x 1024). #13130

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions