-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Add CachingCostBalancerStrategy #4731
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CachingCostBalancerStrategy #4731
Conversation
…ServerRemovedCallback
} | ||
} | ||
|
||
protected void runServerCallbacks(final DruidServer server) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
runServerRemovedCallbacks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed
if (!lifecycleLock.canStop()) { | ||
throw new ISE("CachingCostBalancerStrategyFactory can not be stopped"); | ||
} | ||
executor.shutdownNow(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be in a try block whose finally calls exitStop
I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exitStop() should be called only if lifecycled object is recycleable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment
* LIFE_THRESHOLD is used to avoid calculations for segments that are "far" | ||
* from each other and thus cost(X,Y) ~ 0 for these segments | ||
*/ | ||
private static final long LIFE_THRESHOLD = TimeUnit.DAYS.toMillis(30); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
outside scope of this PR, but this would be nice to have configurable some day.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed
Interval interval = getBucketInterval(segment); | ||
buckets.computeIfPresent( | ||
interval, | ||
(i, builder) -> builder.removeSegment(segment).isEmpty() ? null : builder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this remove the key or just set the value to null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the key, added a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that guaranteed anywhere, or just specific to some implementations? Here's where my java8 knowledge falls flat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's guaranteed by java.util.Map
spec: https://docs.oracle.com/javase/8/docs/api/java/util/Map.html#computeIfPresent-K-java.util.function.BiFunction-
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, I missed it on my other read through. thanks!
double leftCost = 0.0; | ||
// add to cost all left-overlapping segments | ||
int leftIndex = index - 1; | ||
while (leftIndex >= 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This loop looks pretty odd, can it simply be
for (int leftIndex = index - 1; leftIndex >=0; --leftIndex) {
final Segment segment = sortedSegments.get(leftIndex);
if (!segment.getInterval().overlaps(dataSegment.getInterval()))
{
break;
}
final double start = convertStart(sortedSegments.get(leftIndex), interval);
final double end = convertEnd(sortedSegments.get(leftIndex), interval);
leftCost += CostBalancerStrategy.intervalCost(end - start, t0 - start, t1 - start);
}
or similar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It uses leftCost
variable outside of the loop
double rightCost = 0.0; | ||
// add all right-overlapping segments | ||
int rightIndex = index; | ||
while (rightIndex < sortedSegments.size() && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same thing about using a for
loop for readability.
|
||
public static class Builder | ||
{ | ||
private final Map<String, ServerCostCache.Builder> serversCostCache = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this guarded when https://github.com/druid-io/druid/pull/4731/files#diff-9ef443ad87a0cafc15b5318fb10590c5R122 has multiple threads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This exec is not used to populate the builder. Another one, stored in a field in CachingCostBalancerStrategyFactory
is always single-threaded, added a comment about that.
@drcrallen thanks for review, addressed comments |
If this is the same as the prior cost balancer strategy, can it reuse the tests from |
It doesn't produce the same numbers, but (supposedly) makes the same decisions. That's what |
* Add CachingCostBalancerStrategy; Rename ServerView.ServerCallback to ServerRemovedCallback * Fix benchmark units * Style, forbidden-api, review, bug fixes * Add docs * Address comments
CachingCostBalancerStrategy is fast implementation of CostBalancerStrategy. Computation algorithm slightly differs from the original version and allows pre-compute and cache cost function values to reduce complexity of the algorithm.
Motivation:
Implementation details:
Benchmark