ggml : reading the runtime sve config of the cpu #8709

jdomke · 2024-07-26T09:57:08Z

I have read the contributing guidelines
Self-reported review complexity:
- [ x] Low
- Medium
- High

@ggerganov here is a new version of the old #8382 since the code in question has moved from ggml.c to ggml-quants.c. i'll close the old PR.

slaren · 2024-07-26T11:30:52Z

Won't this break SVE support in non-linux systems?

jdomke · 2024-07-26T11:38:41Z

Won't this break SVE support in non-linux systems?

Are there systems with arm+sve which run windows and llama.cpp? Well, windows' WSL layer seem to have prctl support (microsoft/WSL#4035). For non-WSL installations of llama.cpp, i really don't know right now.

slaren · 2024-07-26T12:04:11Z

Are there systems with arm+sve which run windows and llama.cpp?

I guess not, MSVC doesn't seem to support SVE either (there is no arm_sve.h header). clang for VS does have the header, though.

slaren · 2024-07-26T12:17:13Z

Other question is, how will this affect performance? If I am not mistaken this will add a system call before each dot product, which should have a significant overhead.

jdomke · 2024-07-26T12:50:46Z

Other question is, how will this affect performance? If I am not mistaken this will add a system call before each dot product, which should have a significant overhead.

The svcntb check used to live in ggml.c inside ggml_cpu_has_sve function and i believe it was only called one time, but i could be wrong. you are right, now it would be called every time instead of 1 time. I tested 3x runs with the patch: 12.79, 12.77, 12.89 tokens per second. And 3x runs without and change of vector length via /proc/sys/abi/sve_default_vector_length: 43.94, 43.80, and 43.99 tokens per second. So, this seem to be a bad option moving forward. But requiring root to switch to SVE256 on SVE512 architectures is equally bad.

slaren · 2024-07-26T13:21:14Z

The value could be cached at startup. Add an init function to ggml-quants.c, save the result of the system call to global, and call it during first-call initialization in ggml_init.

jdomke · 2024-07-26T13:22:04Z

Other question is, how will this affect performance? If I am not mistaken this will add a system call before each dot product, which should have a significant overhead.

The svcntb check used to live in ggml.c inside ggml_cpu_has_sve function and i believe it was only called one time, but i could be wrong. you are right, now it would be called every time instead of 1 time. I tested 3x runs with the patch: 12.79, 12.77, 12.89 tokens per second. And 3x runs without and change of vector length via /proc/sys/abi/sve_default_vector_length: 43.94, 43.80, and 43.99 tokens per second. So, this seem to be a bad option moving forward. But requiring root to switch to SVE256 on SVE512 architectures is equally bad.

Ok, compared to the old checkout which basically didnt run with sve512, the new one seems to be fine. But the performance is all over the place.

The new fallback path seems to be neon in ggml-quants.c which could explain a 2x slowdown. When the node is in sve512 mode i get ~20 t/s without the wrapper which sets prctl and with wrapper i get 45 t/s.

I guess that's an "artifact" of GNU, because as i commented on the other PR, some compilers might fix svcntb at compile time and we saw crashes in the svcntb checks even with the wrappers enabled.

jdomke · 2024-07-26T17:25:41Z

The value could be cached at startup. Add an init function to ggml-quants.c, save the result of the system call to global, and call it during first-call initialization in ggml_init.

Thanks. Done that, and also unified the svcntXYZ calls. Now without wrapper and sve512 i'm getting ~20t/s (because its falling back to neon) and with wrapper and prctl i'm getting the expected 45 t/s. @slaren and @ggerganov would you like to see additional changes?

slaren · 2024-07-29T13:04:40Z

ggml/src/ggml-quants.h

@@ -127,6 +127,10 @@ void iq2xs_free_impl(enum ggml_type type);
 void iq3xs_init_impl(int grid_size);
 void iq3xs_free_impl(int grid_size);

+#if defined(__ARM_FEATURE_SVE)
+extern int sve_cnt_b;


I think it would be good to add a ggml_ prefix to this variable to avoid possible conflicts with 3rd party software.

Suggested change

extern int sve_cnt_b;

extern int ggml_sve_cnt_b;

ok, yes, that makes sense. i changed it according to your suggestion

examples/gguf-hash/deps/xxhash/xxhash.h

ggml/src/ggml.c

* ggml : reading the runtime sve config of the cpu * change to one time init to prevent performance drop * prefix variable to avoid possible conflicts * revert xxhash fix and add brackets --------- Co-authored-by: domke <[email protected]>

ggml : reading the runtime sve config of the cpu

7cc6432

ggerganov approved these changes Jul 26, 2024

View reviewed changes

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Jul 26, 2024

change to one time init to prevent performance drop

ab8b748

github-actions bot added the examples label Jul 26, 2024

slaren reviewed Jul 29, 2024

View reviewed changes

prefix variable to avoid possible conflicts

4a03d0d

mofosyne added the Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix label Aug 1, 2024

slaren reviewed Aug 1, 2024

View reviewed changes

examples/gguf-hash/deps/xxhash/xxhash.h Outdated Show resolved Hide resolved

slaren reviewed Aug 1, 2024

View reviewed changes

ggml/src/ggml.c Outdated Show resolved Hide resolved

revert xxhash fix and add brackets

e868aa9

slaren approved these changes Aug 3, 2024

View reviewed changes

slaren merged commit 76614f3 into ggml-org:master Aug 3, 2024
52 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ggml : reading the runtime sve config of the cpu #8709

ggml : reading the runtime sve config of the cpu #8709

Uh oh!

jdomke commented Jul 26, 2024

Uh oh!

slaren commented Jul 26, 2024

Uh oh!

jdomke commented Jul 26, 2024

Uh oh!

slaren commented Jul 26, 2024

Uh oh!

slaren commented Jul 26, 2024

Uh oh!

jdomke commented Jul 26, 2024

Uh oh!

slaren commented Jul 26, 2024

Uh oh!

jdomke commented Jul 26, 2024

Uh oh!

jdomke commented Jul 26, 2024

Uh oh!

slaren Jul 29, 2024

Uh oh!

jdomke Aug 1, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ggml : reading the runtime sve config of the cpu #8709

ggml : reading the runtime sve config of the cpu #8709

Uh oh!

Conversation

jdomke commented Jul 26, 2024

Uh oh!

slaren commented Jul 26, 2024

Uh oh!

jdomke commented Jul 26, 2024

Uh oh!

slaren commented Jul 26, 2024

Uh oh!

slaren commented Jul 26, 2024

Uh oh!

jdomke commented Jul 26, 2024

Uh oh!

slaren commented Jul 26, 2024

Uh oh!

jdomke commented Jul 26, 2024

Uh oh!

jdomke commented Jul 26, 2024

Uh oh!

slaren Jul 29, 2024

Choose a reason for hiding this comment

Uh oh!

jdomke Aug 1, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!