Skip to content

ggml : reading the runtime sve config of the cpu #8709

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Aug 3, 2024

Conversation

jdomke
Copy link
Contributor

@jdomke jdomke commented Jul 26, 2024

@ggerganov here is a new version of the old #8382 since the code in question has moved from ggml.c to ggml-quants.c. i'll close the old PR.

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Jul 26, 2024
@slaren
Copy link
Member

slaren commented Jul 26, 2024

Won't this break SVE support in non-linux systems?

@jdomke
Copy link
Contributor Author

jdomke commented Jul 26, 2024

Won't this break SVE support in non-linux systems?

Are there systems with arm+sve which run windows and llama.cpp? Well, windows' WSL layer seem to have prctl support (microsoft/WSL#4035). For non-WSL installations of llama.cpp, i really don't know right now.

@slaren
Copy link
Member

slaren commented Jul 26, 2024

Are there systems with arm+sve which run windows and llama.cpp?

I guess not, MSVC doesn't seem to support SVE either (there is no arm_sve.h header). clang for VS does have the header, though.

@slaren
Copy link
Member

slaren commented Jul 26, 2024

Other question is, how will this affect performance? If I am not mistaken this will add a system call before each dot product, which should have a significant overhead.

@jdomke
Copy link
Contributor Author

jdomke commented Jul 26, 2024

Other question is, how will this affect performance? If I am not mistaken this will add a system call before each dot product, which should have a significant overhead.

The svcntb check used to live in ggml.c inside ggml_cpu_has_sve function and i believe it was only called one time, but i could be wrong. you are right, now it would be called every time instead of 1 time. I tested 3x runs with the patch: 12.79, 12.77, 12.89 tokens per second. And 3x runs without and change of vector length via /proc/sys/abi/sve_default_vector_length: 43.94, 43.80, and 43.99 tokens per second. So, this seem to be a bad option moving forward. But requiring root to switch to SVE256 on SVE512 architectures is equally bad.

@slaren
Copy link
Member

slaren commented Jul 26, 2024

The value could be cached at startup. Add an init function to ggml-quants.c, save the result of the system call to global, and call it during first-call initialization in ggml_init.

@jdomke
Copy link
Contributor Author

jdomke commented Jul 26, 2024

Other question is, how will this affect performance? If I am not mistaken this will add a system call before each dot product, which should have a significant overhead.

The svcntb check used to live in ggml.c inside ggml_cpu_has_sve function and i believe it was only called one time, but i could be wrong. you are right, now it would be called every time instead of 1 time. I tested 3x runs with the patch: 12.79, 12.77, 12.89 tokens per second. And 3x runs without and change of vector length via /proc/sys/abi/sve_default_vector_length: 43.94, 43.80, and 43.99 tokens per second. So, this seem to be a bad option moving forward. But requiring root to switch to SVE256 on SVE512 architectures is equally bad.

Ok, compared to the old checkout which basically didnt run with sve512, the new one seems to be fine. But the performance is all over the place.

The new fallback path seems to be neon in ggml-quants.c which could explain a 2x slowdown. When the node is in sve512 mode i get ~20 t/s without the wrapper which sets prctl and with wrapper i get 45 t/s.

I guess that's an "artifact" of GNU, because as i commented on the other PR, some compilers might fix svcntb at compile time and we saw crashes in the svcntb checks even with the wrappers enabled.

@jdomke
Copy link
Contributor Author

jdomke commented Jul 26, 2024

The value could be cached at startup. Add an init function to ggml-quants.c, save the result of the system call to global, and call it during first-call initialization in ggml_init.

Thanks. Done that, and also unified the svcntXYZ calls. Now without wrapper and sve512 i'm getting ~20t/s (because its falling back to neon) and with wrapper and prctl i'm getting the expected 45 t/s. @slaren and @ggerganov would you like to see additional changes?

@@ -127,6 +127,10 @@ void iq2xs_free_impl(enum ggml_type type);
void iq3xs_init_impl(int grid_size);
void iq3xs_free_impl(int grid_size);

#if defined(__ARM_FEATURE_SVE)
extern int sve_cnt_b;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to add a ggml_ prefix to this variable to avoid possible conflicts with 3rd party software.

Suggested change
extern int sve_cnt_b;
extern int ggml_sve_cnt_b;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, yes, that makes sense. i changed it according to your suggestion

@mofosyne mofosyne added the Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix label Aug 1, 2024
@slaren slaren merged commit 76614f3 into ggml-org:master Aug 3, 2024
52 checks passed
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Aug 7, 2024
* ggml : reading the runtime sve config of the cpu

* change to one time init to prevent performance drop

* prefix variable to avoid possible conflicts

* revert xxhash fix and add brackets

---------

Co-authored-by: domke <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
examples ggml changes relating to the ggml tensor library for machine learning Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants