Skip to content

CUDA: fix strided GEMM for [0,2,1,3] per && ne2==1 #15037

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

JohannesGaessler
Copy link
Collaborator

Fixes failing tests added in #15015 . The problem is that for ne02 == 1 the per-matrix strides can be calculated incorrectly.

@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Aug 2, 2025
@ggerganov
Copy link
Member

I was just also opening a PR: #15038

Could you review that and if it's OK merge it instead as it also has SYCL fix + fix for src1 cont check

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants