Skip to content

Conversation

ggerganov
Copy link
Member

fix #3869

@askmyteapot
Copy link

Can confirm the fix works for Pascal SM6.1

@ggerganov ggerganov added performance Speed related topics Nvidia GPU Issues specific to Nvidia GPUs labels Nov 1, 2023
@cebtenzzre
Copy link
Collaborator

cebtenzzre commented Nov 2, 2023

I can confirm that this brings pp512 on my Tesla P40 back to pre-#3749 speeds.

Now both #3749 and #3776 can be worked around via -DLLAMA_CUDA_FORCE_MMQ=ON on older cards.

@ggerganov ggerganov merged commit 4d719a6 into master Nov 2, 2023
@ggerganov ggerganov deleted the try-fix-3869 branch November 2, 2023 06:35
olexiyb pushed a commit to Sanctum-AI/llama.cpp that referenced this pull request Nov 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Nvidia GPU Issues specific to Nvidia GPUs performance Speed related topics
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CTX Processing regression for Pascal - Commit 2b4ea35
4 participants