-
Notifications
You must be signed in to change notification settings - Fork 1.7k
low-contention refactored reference Cleaner (fixes #1616) #1684
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
low-contention refactored reference Cleaner (fixes #1616) #1684
Conversation
98caf53
to
f0da2f8
Compare
To start this change does not survive benchmarking:
I think that the object becomes immediately eligable for GC and thus the cleaner is invoked before construction is done. This can be fixed by using a public Cleanable register(Object obj, Runnable cleanupTask) {
// The important side effect is the PhantomReference, that is yielded
// after the referent is GCed
Cleanable cleanable = add(new CleanerRef(obj, referenceQueue, cleanupTask));
synchronized (obj) {
}
if (cleanerRunning.compareAndSet(false, true)) {
Logger.getLogger(Cleaner.class.getName()).log(Level.FINE, "Starting CleanerThread");
Thread cleanerThread = new CleanerThread();
cleanerThread.start();
}
return cleanable;
} With that change I ran the numbers and got this (5.18.0 is the last release,
The huge variance for the new code is reproducible and with that and considering the variance of the other results I would not want to merge as is. The code used to produce the values: jnabench.zip
Here is the tree analysis from async profiler: profile-results.zip My attention is drawn to the fact, that the accessors for the ConcurrentHashMap show up in the tree prominently. Edit: Benchmark was run as (the shaded jars were copied after build):
|
Another attept can be found in #1617 - that is a similar approach. |
I went down the gc rabbit hole. This principal fix is this: public Cleanable register(Object referent, Runnable cleanupTask) {
// The important side effect is the PhantomReference, that is yielded
// after the referent is GCed
Cleanable cleanable = add(new CleanerRef(referent, referenceQueue, cleanupTask));
if (cleanerRunning.compareAndSet(false, true)) {
Logger.getLogger(Cleaner.class.getName()).log(Level.FINE, "Starting CleanerThread");
Thread cleanerThread = new CleanerThread();
cleanerThread.start();
}
// NOTE: This is a "pointless" check in the conventional sense, however it serves to guarantee that the
// referent is not garbage collected before the CleanerRef is fully constructed which can happen due
// to reordering of instructions by the compiler or the CPU. In Java 9+ Reference.reachabilityFence() was
// introduced to provide this guarantee, but we want to stay compatible with Java 8, so this is the common
// idiom to achieve the same effect, by ensuring that the referent is still strongly reachable at
// this point.
if (referent == null) {
throw new IllegalArgumentException("The referent object must not be null");
}
return cleanable;
} First, thanks for the JMH setup, it was useful! I do however have some comments about JMH execution. I've been a long time JMH user, basically since Shipilev released it. Running JMH with more theads than cores is basically benchmarking the OS scheduler, not the code. I tend to run with cores minus 2. Also, warmups are super critical. First, the CPU starts to throttle due to heat soak, so iterations fall off a cliff (as they should). Second, the JIT has a chance to see everything in the hot path -- it won't compile until it has seen a method many thousands of times, so you don't want this happening during the actual measurement phase. Finally, garbage collection starts to kick in more as the run progresses, so longer runs provide more real life measurements. You can see what the "drop offf" looks like here:
Now, regarding running the benchmark. I ran with:
The interesting thing is, when running with these settings -- principally 3 measurement iterations with 5 warmups -- I could never get the 5.18.0 run to complete. It always hangs somewhere in the run. Sometimes in the 5th and final fork, sometimes in the first fork, or somewhere between. You might have better luck, maybe it is something with my Mac? In any case, the new code never hung in all of my runs. Even running with 5 measurement iterations the variability is still extremely high. It should probably be run with 20 iterations, but I ran out of patience -- and with 5.18.0 hanging it seemed pointless because there would be no comparable. EDIT: I was not able to run with EDIT2: I was finally able to get a 5.18.0 run to complete. Here are the comparables (using the parameters I mentioned above). v5.18.0
And 5.19.0-SNAPSHOT
As you can see, and as I mentioned above, the variablity is extremely high with so few iterations. I got runs as high as 20k on the score but again with the plus/minus being nearly the same as the score, it's not very accurate. One rough 10,000 foot cheat is to add the score + error and treat that as the score. I'm pretty sure even over a large run the new code is going to come out on top. I'll leave it running while I sleep with a high iteration count. EDIT3: I figured out why the 5.18.0 run was hanging bc I saw it with 5.19.0-SNAPSHOT on higher iteration count -- OutOfMemory error. Neither Cleaner can keep up with the full-blast of JMH at default memory settings, but 5.19.0-SNAPSHOT does substantially better, only failing at higher iteration counts. I'm re-running with higher memory settings. |
UPDATE: Ok, I performed some long runs on 5.18.0 and 5.19.0-SNAPSHOT. I had to increase the JVM memory to 16g to get 5.18.0 to pass. 8g was needed for 5.19.0-SNAPSHOT to pass, but to be fair I ran both with 16g. 5.18.0
5.19.0-SNAPSHOT
This was my command:
With iterations at 10 the variance (error) was reduced by ~66% compared to shorter runs. In order to get it down to 10% variance I suspect iterations might need to be 50. I may update this comment after running on a dual-CPU server at the office, but for now these stand well. |
LOL. Scrap another attempt to address #1616. After hours of extensive testing today on our dual-CPU server, the HashMap-based version definitely performs worse (far worse) when put under high GC pressure. I'm closing this, though I may make another attempt with a different approach. The existing implementation mainly suffers from classic linked-list "tail contention", because all appends occur on the tail. There are various approaches in the acedemic literature that are worth investigating -- including using two locks (head and tail) and alternating appends between head to tail thereby reducing lock contention by 2x, but the edge cases (zero to three nodes) are sticky. Anyway, killing this for now. |
Refactored (re-written)
Cleaner
class, as low-contention and simple as I can get it. Principally, I removed the linked-node structure and replaced with a concurrent set. Fixes #1616.An
AtomicBoolean
is used to track whether the cleaner thread is running and another (per-CleanerRef) is used to track whether the cleanup task has been run, either directly or by the cleaner thread, ensuring it is only executed once.I don't think this class could be simplified any further and still be correct.
@matthiasblaesing If you still have your JMH harness, I'd be interested in how it performs.