Skip to content

CUDA graphs break quantized K cache #7492

Closed
@JohannesGaessler

Description

@JohannesGaessler

As of right now it is already possible on master to quantize the K cache via e.g. -ctk q8_0. However, this is currently broken on master for batch size 1. Disabling CUDA graphs via the environment variable GGML_CUDA_DISABLE_GRAPHS=1 fixes the issue.

cc: @agray3

Metadata

Metadata

Assignees

No one assigned

    Labels

    Nvidia GPUIssues specific to Nvidia GPUsbugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions