Out of GPU memory when running streaming example

setup -
wsl 2 * Ubuntu-22.04
built with cublas


Running streaming example with any number of n_gpu_layers offloading. 

`nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0`

`
WARNING: failed to allocate 512.00 MB of pinned memory: out of memory
WARNING: failed to allocate 512.00 MB of pinned memory: out of memory
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | 
CUDA error 2 at /tmp/pip-install-dcpg9e3d/llama-cpp-python_0b20594a2c9f4aa6a0b4bdca1b250223/vendor/llama.cpp/ggml-cuda.cu:781: out of memory`

also I checked and tried [https://github.com/ggerganov/llama.cpp/issues/1230](url) with no luck


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Out of GPU memory when running streaming example #229

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Out of GPU memory when running streaming example #229

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions