Skip to content

Conversation

ggerganov
Copy link
Member

fix #15015 (comment)

  • Fix strides for batched GEMM to take into account when the ne02 == 1
  • Fix src1 contiguous condition - it's always cont when we convert it

@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Aug 2, 2025
@JohannesGaessler
Copy link
Collaborator

Good catch with src1 being potentially contiguous after a type conversion.

@ggerganov
Copy link
Member Author

The SYCL tests still fail because I think it needs to update the GGML_SYCL_DNNL path of this function. @qnixsynapse Will leave this to your team and merge this for now.

Waiting for the CUDA CI to pass and will merge.

@ggerganov ggerganov merged commit 15e92fd into master Aug 2, 2025
45 of 47 checks passed
@ggerganov ggerganov deleted the gg/cuda-sycl-mm-batched-fix branch August 2, 2025 14:13
@qnixsynapse
Copy link
Collaborator

@Rbiessy @Alcpz Since you guys were maintaining MUL_MAT kernels, tagging you both for visibility.

dpct path in batched kernel also doesn't seem to properly support non_cont inputs in my testing. So not doing anything at this time

Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Aug 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants