Skip to content

[SPARK-49133][CORE] Make member MemoryConsumer#used atomic to avoid user code causing deadlock #51849

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

zhztheplayer
Copy link
Member

@zhztheplayer zhztheplayer commented Aug 5, 2025

What changes were proposed in this pull request?

Turn the field MemoryConsumer#used from long to AtomicLong.

Why are the changes needed?

MemoryConsumer doesn't provide internal thread-safety so developer should add their own lock for concurrent memory allocation in the same task.

Thinking of multiple threads are allocating memory in the same task (although it's a special case regarding Spark's memory model), to protect the thread-safety of MemoryConsumer, user has to lock the API invocations of it. In this case, if one memory consumer spills another concurrently, there's a risk of ABBA deadlock. E.g.,

  1. In thread 1, consumer A acquires memory from TMM
  2. In thread 2, consumer B acquires memory from TMM and spills consumer A.

Deadlock happens at the moment thread 1 locks consumer A and acquires TMM's lock, while consumer B locks TMM then acquires A's lock.

To fix this problem, Spark could ensure MemoryConsumer's thread-safety with an atomic MemoryConsumer#used, so user doesn't have to add a lock in most cases.

Does this PR introduce any user-facing change?

A developer change:

protected long used;

will become

protected final AtomicLong used = new AtomicLong(0L);

To address this, developers could call getUsed() for all Spark versions instead (if they need to read the value of used), without having to maintain a shim layer for this change.

How was this patch tested?

No need to test from Spark code. But fine to add a case to emulate developer's calls if preferred.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the CORE label Aug 5, 2025
@zhztheplayer zhztheplayer changed the title [SPARK-49133][CORE] Make member MemoryConsumer#used atomic to avoid deadlock [SPARK-49133][CORE] Make member MemoryConsumer#used atomic to avoid user code causing deadlock Aug 5, 2025
@yaooqinn
Copy link
Member

yaooqinn commented Aug 6, 2025

cc @JoshRosen @rednaxelafx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants