Skip to content

Implement document expressions #1052

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Dec 13, 2022
Merged

Implement document expressions #1052

merged 10 commits into from
Dec 13, 2022

Conversation

katcharov
Copy link
Collaborator

@katcharov katcharov force-pushed the expressions-doc branch 2 times, most recently from bd470f5 to c502b10 Compare November 21, 2022 15:56
@rozza rozza requested review from rozza, jyemin and stIncMale and removed request for rozza November 22, 2022 14:00
Copy link
Collaborator

@jyemin jyemin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR isn't in a reviewable state (only the last commit seems relevant).

@katcharov
Copy link
Collaborator Author

@jyemin When I rebase one of the branches, all of the ensuing branches need to be rebased (with conflicts resolved along the way). Sometimes I'm caught in the middle of this process (this happened to @stIncMale a couple of times), and the diff is unreadable. This has been resolved in array, date, and conv, and soon in this branch.

@katcharov katcharov requested a review from jyemin November 22, 2022 16:03

NumberExpression getNumber(String field, NumberExpression orElse);

default NumberExpression getNumber(final String field, final Number orElse) {
Copy link
Member

@stIncMale stIncMale Nov 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have Expression.of(Number) in the public API, but we have getNumber(..., Number orElse). It appears that we should either not have getNumber(..., Number orElse), or have it and then have Expression.of(Number) in the public API. And if we have Expression.of(Number), we don't need Expression.of(double)/Expression.of(Decimal128).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The of methods return their type. But these have a type after get, and need to return that type. This is confusing to users if we don't use Number, unlike the case with of. If you still perceive an inconsistency, let's add a note to the final doc to consider switching of to use Number.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The of methods return their type. But these have a type after get, and need to return that type. This is confusing to users if we don't use Number, unlike the case with of.

I think, this argument confuses the MQL number type (expressed in Java as NumberExpression and mentioned in the getNumber method name) with the Java Number (specified as the type of the second parameter) - they are different types. Also, if this argument were applied consistently, then neither DocumentExpression.getInteger(String, long) nor Expressions.ofNumberArray(double...)/(Decimal128...) should not have been introduced as they violate the approach implied by the argument.

If you still perceive an inconsistency, let's add a note to the final doc to consider switching of to use Number.

The API inconsistency was introduced in this PR, and deciding what to do with it does not require postponing as far as I can see (if there were specific future decisions you foresee on which this decision has to rely on, I would expect those to have been already mentioned).


The current integer-specific API is consistent between Expressions and DocumentExpression:

IntegerExpression Expressions.of(int)
IntegerExpression Expressions.of(long)

IntegerExpression DocumentExpression.getInteger(String, int)
IntegerExpression DocumentExpression.getInteger(String, long)

the API that is not integer-specific is not consistent between Expressions and DocumentExpression:

NumberExpression Expressions.of(double)
NumberExpression Expressions.of(Decimal128)

NumberExpression DocumentExpression.getNumber(String, Number)

To achieve consistency we should

  1. either replace getNumber(String, Number) with getNumber(String, double) and getNumber(String, Decimal128);
  2. or replace of(double) and of(Decimal128) with of(Number) (in this case we also need to replace Expressions.ofNumberArray(double...)/(Decimal128...) with Expressions.ofNumberArray(Number...)).

Second option reduces the number of overloads at the cost of shifting some type-checks from compile-time to run-time. I will be fine with either option.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what you are saying is correct. I will consider the two options.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose the first one, because it is more consistent with the nearby getInteger methods.

@katcharov katcharov force-pushed the expressions-doc branch 2 times, most recently from 21b0d66 to bdac2ef Compare November 29, 2022 16:53
@katcharov katcharov requested a review from stIncMale November 29, 2022 17:26

NumberExpression getNumber(String field, NumberExpression orElse);

default NumberExpression getNumber(final String field, final Number orElse) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The of methods return their type. But these have a type after get, and need to return that type. This is confusing to users if we don't use Number, unlike the case with of.

I think, this argument confuses the MQL number type (expressed in Java as NumberExpression and mentioned in the getNumber method name) with the Java Number (specified as the type of the second parameter) - they are different types. Also, if this argument were applied consistently, then neither DocumentExpression.getInteger(String, long) nor Expressions.ofNumberArray(double...)/(Decimal128...) should not have been introduced as they violate the approach implied by the argument.

If you still perceive an inconsistency, let's add a note to the final doc to consider switching of to use Number.

The API inconsistency was introduced in this PR, and deciding what to do with it does not require postponing as far as I can see (if there were specific future decisions you foresee on which this decision has to rely on, I would expect those to have been already mentioned).


The current integer-specific API is consistent between Expressions and DocumentExpression:

IntegerExpression Expressions.of(int)
IntegerExpression Expressions.of(long)

IntegerExpression DocumentExpression.getInteger(String, int)
IntegerExpression DocumentExpression.getInteger(String, long)

the API that is not integer-specific is not consistent between Expressions and DocumentExpression:

NumberExpression Expressions.of(double)
NumberExpression Expressions.of(Decimal128)

NumberExpression DocumentExpression.getNumber(String, Number)

To achieve consistency we should

  1. either replace getNumber(String, Number) with getNumber(String, double) and getNumber(String, Decimal128);
  2. or replace of(double) and of(Decimal128) with of(Number) (in this case we also need to replace Expressions.ofNumberArray(double...)/(Decimal128...) with Expressions.ofNumberArray(Number...)).

Second option reduces the number of overloads at the cost of shifting some type-checks from compile-time to run-time. I will be fine with either option.

Copy link
Member

@stIncMale stIncMale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I have to press "submit" even though I have not finished re-reviewing because GitHub UI messed up the display of pending comments, and I am afraid I may lose them.

DocumentExpression unsetField(String path);


BooleanExpression getBoolean(String field);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the behavior of doc.getBoolean("a.b"). Does it

  1. return the value of the "a.b" field in this document
  2. return the value of the "b" field inside the "a" field (assuming the value of the "a" field is a document)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If 1., the parameter should be named field. If 2. is there any way to use dotted path notation in this API, and if not, should there be?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're not supporting dotted path notation, I worry that because their use is so prevalent in the wild, applications will make the mistake of using them and being really confused by the results they get back

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent was to return (1) the value of the "a.b" field, not the path.

The potential error you mention seems serious to me, but I am unsure of how to address it.

  • It seems that there needs to be some way to use a literal field notation, so whatever we do we will need one Expression get() or many getters that take a proper field.
  • We could try getBooleanField and getBooleanPath, but this seems like overkill, and perhaps would not address the possible user error.
  • There is a get(key) on MapExpression that does not and should not take a path. MqlExpression implements both interfaces.
  • If we can implement what I have been calling "schema objects" (perhaps "object bindings"?) then one can write: doc.a.b. I think this would disincline users from any of the getField methods, and avoid the issue, but only to the extent that object bindings are possible and used by users.

I'll keep thinking. Suggestions welcome.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the more common case will be people wanting to treat the field names as paths that traverse subdocuments. It's only in the last couple of years that the server or drivers even allowed field names with dots in them. So that suggests that the default behavior should be to treat dotted field names as paths, and have an explicit way to treat them as field names.

At the same time, it's an attractive option to use this opportunity to "fix" the behavior, and provide explicit syntax for dotted paths. So something like:

   Expression getField(String fieldName);    // treat "a.b" as a field name
   Expression getField(List<String> path);   // treat each string in the list as a field name

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the same time, it's an attractive option to use this opportunity to "fix" the behavior, and provide explicit syntax for dotted paths. So something like:
getField(List path)

If we use any type but String to represent field names / paths, I think we should introduce a special type instead of using List<String>.

I would also prefer to just introduce a Path type with two ways of creating its instances, something like Path.name and Path.path (in the search API we have SearchPath.fieldPath and SearchPath.wildcardPath, and we never accept plain String as a path in the rest of the API). We may even introduce two subtypes if there are situations in MQL when a path is not allowed and only a field name may be used (I don't know if there are such situations). I discussed this with Maxim, and he does not like the idea due to its verbosity, others may think the same. But I think that introducing a type that covers both field names (just a degenerate case of a field path) and field paths is the right way to approach the task: a plain String can technically be used to represent a full name of a human, text of a book, a path in a file system, a field path in MongoDB, but it does not mean that using a plain String for any of that is necessarily a good idea. Having a special type allows one to both express the meaning of a value and validate it at creation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed, all prefer the getInteger(String fieldName). It would be beneficial to have a way to query by paths, but that can be considered later.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added tests. "fieldName" is a fairly long parameter hint in IntelliJ, but let's see how that goes.

}

@Test
public void getFieldTest() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add tests for field names with embedded . characters so that we have regression tests on the behavior

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also need to check if this is broken in the current implementation - on the first reading of the getField docs, it seemed it would be automatically treated as a literal, but I think it needs to be explicitly wrapped.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this done?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

import java.time.Instant;

import static com.mongodb.client.model.expressions.Expressions.of;

/**
* Expresses a document value. A document is an ordered set of fields, where the
* key is a string value, mapping to a value of any other expression type.
*/
public interface DocumentExpression extends Expression {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both setField and getField allow the field name itself to be an expression. That API should support that as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is supported for Map (arbitrary number of string keys to values all of the same type), but is not supported for Documents, which represent records (fixed number of string-named fields to values of different types). Neither one extends the other, because:

  1. A record cannot have a set(T) method since this would allow fields to be set via a broader type than they actually are.
  2. A map<StringEx> cannot have a method getBoolean(field) because this would allow string values to be pulled out as booleans.

Any situation that requires this is almost certainly more correctly handled via a MapExpression<Expression>, rather than a document (record).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a MapExpression? I don't see that anywhere.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's in #1054

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you show me what the code would look like? I don't understand how it works in practice, e.g. say I have a document like:

{
   s1 : "str1",
   s2: "str2",
   s3 : "str3",
   i: 1
}

How would I do something like set one of the s fields where the suffix of the field name is the value of the i field?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should work:

MapExpression<Expression> map = ofMap(<above map here>);
StringExpression key = of("s").concat(map.get("i").asString());
map.set(key, of("new_value"))

This particular operation should not be done to a Document/Record, for the same reason this is not available in Java except through the use of reflection.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deferring at least to map expression PR

@stIncMale stIncMale mentioned this pull request Nov 30, 2022
@katcharov katcharov requested a review from stIncMale December 1, 2022 22:34
@katcharov katcharov requested a review from stIncMale December 2, 2022 15:45
Comment on lines 50 to 56
default NumberExpression getNumber(final String fieldName, final double other) {
return getNumber(fieldName, Expressions.numberToExpression(other));
}

default NumberExpression getNumber(final String fieldName, final Decimal128 other) {
return getNumber(fieldName, Expressions.numberToExpression(other));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose using of(other) instead of Expressions.numberToExpression(other). When this is done, please consider the PR approved by me.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this done?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

@katcharov katcharov requested a review from jyemin December 5, 2022 18:20
Copy link
Collaborator

@jyemin jyemin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated some of the comment threads that I'm on.

@katcharov katcharov requested a review from jyemin December 9, 2022 22:00
Base automatically changed from expressions-conv to expressions December 13, 2022 21:42
@katcharov katcharov merged commit 81305dd into expressions Dec 13, 2022
@katcharov katcharov deleted the expressions-doc branch December 13, 2022 21:50
katcharov added a commit that referenced this pull request Jan 30, 2023
katcharov added a commit that referenced this pull request Jan 31, 2023
katcharov added a commit that referenced this pull request Jan 31, 2023
* Implement boolean expressions (#1025)

JAVA-4779

* Implement filter, map, reduce (#1031)

JAVA-4781

* Implement eq, ne, gt, gte, lt, lte (#1033)

JAVA-4784

* Implement string expressions (#1036)

JAVA-4801

* Implement arithmetic expressions (#1037)

Implement arithmetic expressions (from top 50, and others)

JAVA-4803

* Implement array expressions (#1043)

JAVA-4805

* Implement date expressions (#1045)

JAVA-4804

* Implement conversion/type expressions (#1050)

JAVA-4802

* Implement document expressions (#1052)

JAVA-4782

* Replace reduce with individual reductions (#1053)

JAVA-4814

* Implement map expressions (#1054)

JAVA-4817

* Implement switch expression (#1055)

JAVA-4813

* Test expressions in context (#1057)

JAVA-4820

* Add javadoc for boolean, date, number, integer, and expression (#1059)

 JAVA-4799

* Update and add documentation (#1059)

* Fix, tests

 JAVA-4799

* Add `@MqlUnchecked` and a few usage examples (#1059)

 JAVA-4799

* Add has to document, add tests (#1070)

 JAVA-4799

* Add javadocs for remaining classes (#1070)

 JAVA-4799

* 5.2 annotations (#1070)

 JAVA-4799

* 5.0 annotations (#1070)

 JAVA-4799

* 4.4 annotations (#1070)

 JAVA-4799

* 4.2 annotations (#1070)

 JAVA-4799

* 4.0 annotations (#1070)

 JAVA-4799

* Update and add documentation, add tests, fix minor issues (#1070)

Rename extractBsonValue

Fix access modifiers

Remove excess comments

Update docs

Fix: behaviour of get

Add notNull to API, add notNullApi test

Fix docs/annotations, tests

Fix docs, annotations, since

Fix docs

Revert external

Add missing MqlUnchecked

Fix missing null checks

Checkstyle

JAVA-4799

* Rename to Mql (automated) (#1073)

JAVA-3879

* Rename methods (automated) (#1073)

JAVA-3879

* Update naming, terms, and missing checks and annotations (#1073)

JAVA-3879

---------

Co-authored-by: Valentin Kovalenko <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants