-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Add builders for percentile and median accumulators/window functions #1139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…for Scala. JAVA-3860
JAVA-3860
JAVA-3860
This reverts commit 15f6dc9.
JAVA-3860
...r-core/src/test/functional/com/mongodb/client/model/AggregatesFunctionalSpecification.groovy
Outdated
Show resolved
Hide resolved
JAVA-3860
driver-core/src/test/functional/com/mongodb/client/model/AggregatesTest.java
Outdated
Show resolved
Hide resolved
driver-core/src/test/functional/com/mongodb/client/model/AggregatesTest.java
Outdated
Show resolved
Hide resolved
driver-core/src/test/functional/com/mongodb/client/model/AggregatesTest.java
Outdated
Show resolved
Hide resolved
driver-core/src/test/functional/com/mongodb/client/model/AggregatesTest.java
Outdated
Show resolved
Hide resolved
driver-core/src/test/functional/com/mongodb/client/model/AggregatesTest.java
Outdated
Show resolved
Hide resolved
driver-core/src/test/functional/com/mongodb/client/model/AggregatesTest.java
Outdated
Show resolved
Hide resolved
driver-core/src/main/com/mongodb/client/model/Accumulators.java
Outdated
Show resolved
Hide resolved
driver-core/src/main/com/mongodb/client/model/Accumulators.java
Outdated
Show resolved
Hide resolved
driver-core/src/test/unit/com/mongodb/client/model/TestWindowOutputFields.java
Outdated
Show resolved
Hide resolved
driver-core/src/test/unit/com/mongodb/client/model/TestWindowOutputFields.java
Outdated
Show resolved
Hide resolved
- Squash quantile tests into parametrized one. JAVA-3860
JAVA-4888
JAVA-4888
driver-core/src/main/com/mongodb/client/model/ApproximateQuantileMethod.java
Show resolved
Hide resolved
driver-core/src/test/functional/com/mongodb/client/model/OperationTest.java
Outdated
Show resolved
Hide resolved
JAVA-4888
JAVA-4888
JAVA-4888
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(This is a partial review.)
* Returns a combination of a computed field and an accumulator that generates a BSON {@link org.bson.BsonType#DOUBLE Double } | ||
* representing the median value computed from the given {@code inExpression} within a group. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am finding these docs surprisingly hard to follow. There appear to be 3 levels of abstraction, and at least 6 complex relations in the same sentence:
- this combines and returns: computed field + accumulator
- which generates: BSON
- which represents: median (which, in turn: computed, from-given, within-group)
I think we should focus on what this method is used for, and break up the relevant pieces of information, starting with the most relevant:
Used to determine the median value for the group. The result is emitted under the specified field name. Each item in a group is a document, so the "input expression" is used to specify the numeric value for each document. This is typically a numeric field on each document, but any expression may be specified.
I think we should also specify what happens to non-numeric values, what the result is when there are no documents, and what MQL value represents the document in cases where a plain field is not used (usually you can just use fields, but it should be possible to get the "median" number of fields per document, and I don't see how to do this ... would we use $$CURRENT?). And, we should have tests that cover these cases.
If we do want to still focus on (or include) certain technical details rather than usage, I think we should be more clear, and still, for example, break up the information so it is easier to understand.
(The same I think applies to the other docs.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed: we can address this later, in a future PR (or, decide not to).
* @param fieldName The field computed by the accumulator. | ||
* @param inExpression The input expression. | ||
* @param method The method to be used for computing the median. | ||
* @param <InExpression> The type of the input expression. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should link the "type" (or the word "expression" on the param itself) of any new docs to MqlValue (which is in Beta).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is a good idea. However, I wonder if we should consider including the information that the Mql API
can be utilized in situations where an expression is required, in the class level documentation instead? MqlValue
can be used allmost in all methods of Accumulator
class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine, this would be inconsistent with the rest of the class.
Discussed: we can address this later, in a future PR (or, decide not to).
driver-core/src/test/functional/com/mongodb/client/model/AggregatesTest.java
Outdated
Show resolved
Hide resolved
driver-core/src/test/functional/com/mongodb/client/model/AggregatesTest.java
Outdated
Show resolved
Hide resolved
|
||
//when | ||
List<Document> results = getCollectionHelper().aggregate(Collections.singletonList( | ||
group(new Document("gid", "$z"), quantileAccumulator)), DOCUMENT_DECODER); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect you can get rid of DOCUMENT_DECODER everywhere by doing:
List<BsonDocument> results = ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use new Document("gid", "$z")
instead of $z
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed: we can address this (DOCUMENT_DECODER) later, in a future PR. ($z fixed)
List<Document> results = getCollectionHelper().aggregate(Collections.singletonList( | ||
group(new Document("gid", "$z"), quantileAccumulator)), DOCUMENT_DECODER); | ||
|
||
//then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test does not test the generated BSON, but it could do so as with the existing tests in this class, via:
List<Bson> pipeline = assertPipeline(...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed: we can address this later, in a future PR.
Object result = results.stream() | ||
.filter(document -> document.get("_id").equals(new Document("gid", true))) | ||
.findFirst().map(document -> document.get("sat_95")).get(); | ||
|
||
assertEquals(expectedGroup1, result); | ||
|
||
result = results.stream() | ||
.filter(document -> document.get("_id").equals(new Document("gid", false))) | ||
.findFirst().map(document -> document.get("sat_95")).get(); | ||
|
||
assertEquals(expectedGroup2, result); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that the above boilerplate could be replaced with, for example:
assertResults(pipeline, "[\n"
+ " { _id: true, sat_95: [3, 2] },\n"
+ " { _id: false, sat_95: [1, 2] }\n"
+ "]");
This seems easier to follow than using a stream to pick out the ids, and then manually matching on the particular field.
So instead of the ~10 lines of boilerplate, plus these ~3 lines per test:
Arguments.of(
percentile("sat_95", "$x", new double[]{0.95}, QuantileMethod.approximate()),
asList(3.0), asList(1.0)),
there could be ~6 lines per test:
List<Bson> pipeline = assertPipeline(
...
group("$z", percentile("sat_95", "$x", ofNumberArray(0.95))));
assertResults(pipeline, "[\n"
+ " { _id: true, sat_95: [3, 2] },\n"
+ " { _id: false, sat_95: [1, 1] }\n"
+ "]");
I count the lines because parameterization is often used to decrease boilerplate and line count, but to me it seems there can often be other ways to deal with that problem which ultimately make the code more clear.
I think it is fine to have a some duplication in the "data" part of a test (for example, trying to remove it separates percentile
from group
, and the array values from sat_95
, which makes things harder to follow).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed: we can address this later, in a future PR (or, decide not to). We should have a consistent style in the tests.
JAVA-3860
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This pull request (PR) introduces the capability to calculate the median and percentile of numeric values in the MongoDB aggregation pipeline for $group and $setWindowFields stages
JAVA-3860