Description
setup -
wsl 2 * Ubuntu-22.04
built with cublas
Running streaming example with any number of n_gpu_layers offloading.
nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2021 NVIDIA Corporation Built on Thu_Nov_18_09:45:30_PST_2021 Cuda compilation tools, release 11.5, V11.5.119 Build cuda_11.5.r11.5/compiler.30672275_0
WARNING: failed to allocate 512.00 MB of pinned memory: out of memory WARNING: failed to allocate 512.00 MB of pinned memory: out of memory AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | CUDA error 2 at /tmp/pip-install-dcpg9e3d/llama-cpp-python_0b20594a2c9f4aa6a0b4bdca1b250223/vendor/llama.cpp/ggml-cuda.cu:781: out of memory
also I checked and tried https://github.com/ggerganov/llama.cpp/issues/1230 with no luck