Skip to content

groestl: add AVX-512/GFNI backend #720

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

robbie01
Copy link

#718

I took a conservative approach here and kept the same in-memory representation as the original (now soft) backend. This results in an extra two vpermbs (_mm512_permutexvar_epi8) per call to compress and p. (Note: this is not a per-block overhead, as compress now works on a slice of blocks per @newpavlov's recommendation.)

If it's acceptable, I can modify the code to use the same state representation in memory as it does in the register. It should be risk-free as it would be absurd for CPU features to change during execution.

@robbie01 robbie01 force-pushed the groestl-avx512-gfni branch 2 times, most recently from 7220730 to 6dc67f7 Compare August 13, 2025 15:23
@robbie01
Copy link
Author

Performance (with -C target-cpu=native, Ryzen 9 7900X, x86_64-pc-windows-msvc):

soft backend:

test groestl256_10    ... bench:          62.62 ns/iter (+/- 1.15) = 161 MB/s
test groestl256_100   ... bench:         604.86 ns/iter (+/- 7.71) = 165 MB/s
test groestl256_1000  ... bench:       5,930.86 ns/iter (+/- 83.92) = 168 MB/s
test groestl256_10000 ... bench:      59,241.11 ns/iter (+/- 535.22) = 168 MB/s

avx512_gfni backend:

test groestl256_10    ... bench:          15.39 ns/iter (+/- 0.42) = 666 MB/s
test groestl256_100   ... bench:         148.98 ns/iter (+/- 5.03) = 675 MB/s
test groestl256_1000  ... bench:       1,402.30 ns/iter (+/- 27.58) = 713 MB/s
test groestl256_10000 ... bench:      13,936.83 ns/iter (+/- 608.29) = 717 MB/s

@robbie01 robbie01 force-pushed the groestl-avx512-gfni branch from 6dc67f7 to ff01186 Compare August 13, 2025 16:49
@robbie01 robbie01 marked this pull request as draft August 13, 2025 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant