zdnn: refactor codebase + add docs #16178

taronaeo · 2025-09-22T18:38:52Z

This PR cleans up the zDNN codebase by refactoring the operations into individual files for better readability and easier collaboration. It also includes the zDNN backend documentation and lists zDNN as an available backend in README.md.

This PR should have no performance changes as it is merely a refactor. However, I've still ran tests just in-case.

Performance

model	size	params	backend	threads	test	t/s
granite 3B all F32	9.44 GiB	2.53 B	zDNN,BLAS	8	pp512	215.78 ± 0.33
granite 3B all F32	9.44 GiB	2.53 B	zDNN,BLAS	8	tg128	4.70 ± 0.02
granite 3B F16	4.72 GiB	2.53 B	zDNN,BLAS	8	pp512	217.33 ± 1.52
granite 3B F16	4.72 GiB	2.53 B	zDNN,BLAS	8	tg128	4.70 ± 0.05
granite 3B BF16	4.72 GiB	2.53 B	zDNN,BLAS	8	pp512	216.35 ± 0.18
granite 3B BF16	4.72 GiB	2.53 B	zDNN,BLAS	8	tg128	4.63 ± 0.06

Note

Tests were conducted on an IBM z17 Mainframe with 40 IFLs (cores) and 128 GB Memory on a shared R&D LPAR.

`test-backend-ops`

./build/bin/test-backend-ops -b zDNN | grep -v "not supported"                                                                     
ggml_zdnn_init: allocating                                                                                                                                        
ggml_zdnn_init: found 1 device                                                                                                                                    
ggml_zdnn_init: picking default device: zDNN                                                                                                                      
ggml_zdnn_init: NNPA name: zDNN                                                                                                                                   
ggml_zdnn_init: NNPA_PARMBLKFORMAT_0 = true                                                                                                                       
ggml_zdnn_init: NNPA_PARMBLKFORMAT_1 = true                                                                                                                       
Testing 3 devices                                                                                                                                                 
                                                                                                                                                                  
Backend 1/3: zDNN                                                                                                                                                 
  Device description: IBM Z Neural Network Processing Assist (NNPA)                                                                                               
  Device memory: 0 MB (0 MB free)                                                                                                                                 
                                                                                                                                                                  
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK                                                                       
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=2,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK                                                                       
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=3,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK                                                                       
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=4,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK                                                                       
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=5,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK                                                                       
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=6,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK                                                                       
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=7,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK                                                                       
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=8,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK                                                                       
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=9,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK                                                                       
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK                                                                       
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=2,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK                                                                       
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=3,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=4,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=5,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=6,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=7,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=8,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=9,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=bf16,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=bf16,type_b=f32,m=16,n=2,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=bf16,type_b=f32,m=16,n=3,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=bf16,type_b=f32,m=16,n=4,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=bf16,type_b=f32,m=16,n=5,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=bf16,type_b=f32,m=16,n=6,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=bf16,type_b=f32,m=16,n=7,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=bf16,type_b=f32,m=16,n=8,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=bf16,type_b=f32,m=16,n=9,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=16,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=1,k=4,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=16,k=4,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=16,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=1,k=4,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=16,k=4,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=16,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=1,k=4,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=16,k=4,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=bf16,type_b=f32,m=16,n=1,k=1,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
  MUL_MAT(type_a=bf16,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0,o=1): OK
ggml_zdnn_free: deallocating
  14491/14491 tests passed
  Backend zDNN: OK
Backend 2/3: BLAS
  Skipping
Backend 3/3: CPU
  Skipping
3/3 backends passed
OK

Signed-off-by: Aaron Teo <[email protected]>

taronaeo · 2025-09-23T06:19:23Z

CI / ggml-ci-x64-cpu-amx (pull_request) seem to have consistent failures of cat: /home/ggml/results/llama.cpp/qwen3_0_6b-imatrix-sum.log: No such file or directory across multiple PR CI tests.
CI / ggml-ci-x64-nvidia-t4-vulkan (pull_request) and CI / ggml-ci-x64-nvidia-t4-vulkan-coopmat1 (pull_request) seem to be having resource problems where the compilation is prematurely terminated.
CI / ggml-ci-mac-metal seem to not have any runners picking it up.

I'll cancel them and re-run again to see if its an ephemeral problem.

ggerganov · 2025-09-23T06:49:39Z

Ignore the AMX failure - it's some issue in the amx backend. The Mac runner is up now - it will take some time to catch up with all the queued workflows, but should be good after that.

I will take a look at the vulkan issues. But none of these are problem for this PR.

@danbev

* origin/master: (39 commits) ci : disable AMD workflows + update NVIDIA workflows (ggml-org#16200) ci : enable Vulkan workflow on Mac (ggml-org#16194) ggml-cpu: Respect cpumask settings (ggml-org#16164) ggml : fix uninitialized is_on_grid in quantize_row_iq3_xxs_impl (ggml-org#15928) zdnn: refactor codebase + add docs (ggml-org#16178) codeowners : add @danbev to model-conversion example [no ci] (ggml-org#16190) devops: add s390x containers (ggml-org#15915) ggml-cpu : fix typo in gemm comments [no ci] (ggml-org#16189) feat: Add conversion support in GraniteHybrid for non-hybrid (all attn) (ggml-org#16177) clang-tidy : disable warning about performance enum size (ggml-org#16127) ggml : implement set_rows with i32 index (ggml-org#16159) codeowners : update + cleanup (ggml-org#16174) common : enable `--offline` mode without curl support (ggml-org#16137) webui : fix handling incomplete chunks (ggml-org#16107) embedding : fix typos in README (ggml-org#16171) common : remove unused local variables (ggml-org#16140) ggml : extend ggml_can_fuse to work with non-sequential nodes (ggml-org#16123) ggml : add ggml_op_is_empty (ggml-org#16122) codeowners : update ownership for @ngxson and @allozuar (ggml-org#16128) Vulkan: add conv_transpose_2d operation (ggml-org#16022) ...

* zdnn: initial matmul refactor Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: rm static from funcs Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: update ggml-zdnn.h Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: change header files to hpp Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: switch to common.hpp Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: move mulmat forward around Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: rm inline from utils Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: code cleanup Signed-off-by: Aaron Teo <[email protected]> * docs: add zDNN docs Signed-off-by: Aaron Teo <[email protected]> --------- Signed-off-by: Aaron Teo <[email protected]>

taronaeo added 9 commits September 23, 2025 00:13

zdnn: initial matmul refactor

06b41e6

Signed-off-by: Aaron Teo <[email protected]>

ggml-zdnn: rm static from funcs

8a10594

Signed-off-by: Aaron Teo <[email protected]>

ggml-zdnn: update ggml-zdnn.h

3e6d42c

Signed-off-by: Aaron Teo <[email protected]>

ggml-zdnn: change header files to hpp

996c66c

Signed-off-by: Aaron Teo <[email protected]>

ggml-zdnn: switch to common.hpp

4cf39ae

Signed-off-by: Aaron Teo <[email protected]>

ggml-zdnn: move mulmat forward around

bca9678

Signed-off-by: Aaron Teo <[email protected]>

ggml-zdnn: rm inline from utils

c7e7be7

Signed-off-by: Aaron Teo <[email protected]>

ggml-zdnn: code cleanup

7505ca8

Signed-off-by: Aaron Teo <[email protected]>

docs: add zDNN docs

8efa4aa

Signed-off-by: Aaron Teo <[email protected]>

taronaeo requested review from ggerganov and slaren as code owners September 22, 2025 18:38

github-actions bot added documentation Improvements or additions to documentation ggml changes relating to the ggml tensor library for machine learning IBM zDNN issues specific to IBM zDNN Accelerator labels Sep 22, 2025

slaren approved these changes Sep 22, 2025

View reviewed changes

taronaeo merged commit 264f1b5 into ggml-org:master Sep 23, 2025
115 of 122 checks passed

taronaeo mentioned this pull request Sep 25, 2025

vendors: update miniaudio version #16212

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

zdnn: refactor codebase + add docs #16178

zdnn: refactor codebase + add docs #16178

Uh oh!

taronaeo commented Sep 22, 2025

Uh oh!

taronaeo commented Sep 23, 2025

Uh oh!

ggerganov commented Sep 23, 2025

Uh oh!

Uh oh!

Uh oh!

zdnn: refactor codebase + add docs #16178

zdnn: refactor codebase + add docs #16178

Uh oh!

Conversation

taronaeo commented Sep 22, 2025

Performance

test-backend-ops

Uh oh!

taronaeo commented Sep 23, 2025

Uh oh!

ggerganov commented Sep 23, 2025

Uh oh!

Uh oh!

Uh oh!

`test-backend-ops`