Skip to content

metal: bug-fix when enable ggml-alloc #2757

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 24, 2023

Conversation

lshzh-ww
Copy link
Contributor

The first commit improves memory management when using ggml-alloc: We should only free the tensor at memory barriers. Otherwise, memory released by one tensor might be immediately reused by another tensor that runs concurrently with it.

The second commit fixes a silent return in the allocate_node() function.

This PR also removes workarounds needed by Falcon.

The ggml-alloc should only free tensors at memory barriers.
@lshzh-ww lshzh-ww requested a review from slaren August 24, 2023 05:32
In certain cases, the allocate_node() function may silently return
without performing any memory allocation.
@lshzh-ww lshzh-ww force-pushed the metal-concur-alloc-fix branch from 8496a24 to 0c268a8 Compare August 24, 2023 05:35
@lshzh-ww
Copy link
Contributor Author

@ggerganov
Not sure if this will also fix the "hang" behavior reported in #2678 yesterday.

Copy link
Member

@slaren slaren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes to ggml-alloc look fine.

@ggerganov ggerganov merged commit 38b16df into ggml-org:master Aug 24, 2023
@colinc
Copy link

colinc commented Aug 25, 2023

This has a seg fault in ggml-alloc.c @ line 508 on 70b parameter models.

This is happening right after
exec: RESHAPE (tmpq (reshaped)) <= tmpq

@lshzh-ww
Copy link
Contributor Author

@colinc Fixed with PR #2776 .

Btw, Thank you for providing that information! Save me a lot of time.

akawrykow pushed a commit to akawrykow/llama.cpp that referenced this pull request Aug 29, 2023
* metal: better memory alloc w/ concurrency dispatch

The ggml-alloc should only free tensors at memory barriers.

* ggml-alloc: avoid return silently

In certain cases, the allocate_node() function may silently return
without performing any memory allocation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants