Skip to content

Out of GPU memory when running streaming example #229

Closed
@Celppu

Description

@Celppu

setup -
wsl 2 * Ubuntu-22.04
built with cublas

Running streaming example with any number of n_gpu_layers offloading.

nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2021 NVIDIA Corporation Built on Thu_Nov_18_09:45:30_PST_2021 Cuda compilation tools, release 11.5, V11.5.119 Build cuda_11.5.r11.5/compiler.30672275_0

WARNING: failed to allocate 512.00 MB of pinned memory: out of memory WARNING: failed to allocate 512.00 MB of pinned memory: out of memory AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | CUDA error 2 at /tmp/pip-install-dcpg9e3d/llama-cpp-python_0b20594a2c9f4aa6a0b4bdca1b250223/vendor/llama.cpp/ggml-cuda.cu:781: out of memory

also I checked and tried https://github.com/ggerganov/llama.cpp/issues/1230 with no luck

Metadata

Metadata

Assignees

No one assigned

    Labels

    hardwareHardware specific issuemodelModel specific issue

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions