-
Notifications
You must be signed in to change notification settings - Fork 13.2k
zdnn: refactor codebase + add docs #16178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Aaron Teo <[email protected]>
Signed-off-by: Aaron Teo <[email protected]>
Signed-off-by: Aaron Teo <[email protected]>
Signed-off-by: Aaron Teo <[email protected]>
Signed-off-by: Aaron Teo <[email protected]>
Signed-off-by: Aaron Teo <[email protected]>
Signed-off-by: Aaron Teo <[email protected]>
Signed-off-by: Aaron Teo <[email protected]>
Signed-off-by: Aaron Teo <[email protected]>
slaren
approved these changes
Sep 22, 2025
I'll cancel them and re-run again to see if its an ephemeral problem. |
Ignore the AMX failure - it's some issue in the amx backend. The Mac runner is up now - it will take some time to catch up with all the queued workflows, but should be good after that. I will take a look at the vulkan issues. But none of these are problem for this PR. |
gabe-l-hart
added a commit
to gabe-l-hart/llama.cpp
that referenced
this pull request
Sep 23, 2025
* origin/master: (39 commits) ci : disable AMD workflows + update NVIDIA workflows (ggml-org#16200) ci : enable Vulkan workflow on Mac (ggml-org#16194) ggml-cpu: Respect cpumask settings (ggml-org#16164) ggml : fix uninitialized is_on_grid in quantize_row_iq3_xxs_impl (ggml-org#15928) zdnn: refactor codebase + add docs (ggml-org#16178) codeowners : add @danbev to model-conversion example [no ci] (ggml-org#16190) devops: add s390x containers (ggml-org#15915) ggml-cpu : fix typo in gemm comments [no ci] (ggml-org#16189) feat: Add conversion support in GraniteHybrid for non-hybrid (all attn) (ggml-org#16177) clang-tidy : disable warning about performance enum size (ggml-org#16127) ggml : implement set_rows with i32 index (ggml-org#16159) codeowners : update + cleanup (ggml-org#16174) common : enable `--offline` mode without curl support (ggml-org#16137) webui : fix handling incomplete chunks (ggml-org#16107) embedding : fix typos in README (ggml-org#16171) common : remove unused local variables (ggml-org#16140) ggml : extend ggml_can_fuse to work with non-sequential nodes (ggml-org#16123) ggml : add ggml_op_is_empty (ggml-org#16122) codeowners : update ownership for @ngxson and @allozuar (ggml-org#16128) Vulkan: add conv_transpose_2d operation (ggml-org#16022) ...
struct
pushed a commit
to struct/llama.cpp
that referenced
this pull request
Sep 26, 2025
* zdnn: initial matmul refactor Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: rm static from funcs Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: update ggml-zdnn.h Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: change header files to hpp Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: switch to common.hpp Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: move mulmat forward around Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: rm inline from utils Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: code cleanup Signed-off-by: Aaron Teo <[email protected]> * docs: add zDNN docs Signed-off-by: Aaron Teo <[email protected]> --------- Signed-off-by: Aaron Teo <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
documentation
Improvements or additions to documentation
ggml
changes relating to the ggml tensor library for machine learning
IBM zDNN
issues specific to IBM zDNN Accelerator
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR cleans up the zDNN codebase by refactoring the operations into individual files for better readability and easier collaboration. It also includes the zDNN backend documentation and lists zDNN as an available backend in
README.md
.This PR should have no performance changes as it is merely a refactor. However, I've still ran tests just in-case.
Performance
Note
Tests were conducted on an IBM z17 Mainframe with 40 IFLs (cores) and 128 GB Memory on a shared R&D LPAR.
test-backend-ops