Skip to content

Support any number of Document Sequence Sections in CommandMessage#getCommandDocument #1456

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jul 26, 2024

Conversation

jyemin
Copy link
Collaborator

@jyemin jyemin commented Jul 20, 2024

@jyemin jyemin self-assigned this Jul 20, 2024

while (outputByteBuf.hasRemaining()) {
outputByteBuf.position(outputByteBuf.position() + 1 /* payload type */ + 4 /* payload size */);
String payloadName = getPayloadName(outputByteBuf);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we get the payload name directly from the OP_MSG instead of from the SplittablePayload, making it easier to support any number of document sequence independent of any future changes to SplittablePayload

Copy link
Member

@stIncMale stIncMale Jul 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we now get the "payload name" (which is actually, the sequence identifier) effectively from what have been previously written to the bsonOutput, why do we still use SplittablePayload.getPayloadName, including using it in CommandMessage?

I don't yet have a good understanding of what's going on, but the above seems quite confusing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When writing the OP_MSG Kind 1 Section, we need to get the payload name from somewhere. It comes from SplittablePayload. This code is just getting it out of the OP_MSG after it's been written, so it no longer has to depend on SplittablePayload, which is now free to evolve independent of this method (for example, to support multiple payloads).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are suggesting that the getPayloadName method should be renamed to something like getSequenceIdentifier, I agree.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, I now understand what is so confusing here:

  • SplittablePayload.getPayloadName is only supposed to be used when encoding (so it's not surprising that the method is still used), and obviously should not be used when decoding. Its previous usage for decoding appears to having been an attempt to simplify the small piece of the decoding code at the cost of abusing SplittablePayload.getPayloadName and making the whole picture less clear.
  • It is also unclear if CommandMessage.getCommandDocument should be an instance method of CommandMessage (more about this below). If it were not an instance method of CommandMessage, then it would not have been so easy to access SplittablePayload.getPayloadName from CommandMessage.getCommandDocument, and SplittablePayload.getPayloadName likely would not have been used by CommandMessage.getCommandDocument to begin with.

CommandMessage.getCommandDocument requires its argument to be not just any ByteBufferBsonOutput with a command encoded in the right format, but the one that was filled by the CommandMessage.encode method called on the same instance of CommandMessage the last time CommandMessage.encode was called. I am not sure what code design principle this violates, but it definitely violates something, makes the code fragile and difficult to think about.

  1. The only hard reason for CommandMessage.getCommandDocument to be an instance method of CommandMessage, is that it needs information on where in bsonOutput (its argument) to start decoding from.
  2. CommandMessage remembers the document's position in RequestMessage.encodingMetadata when CommandMessage.encode is called (each encode overrides the previously remembered position), and CommandMessage.getCommandDocument uses that remembered position.

I wonder, why it was designed this way? An immediate "fix" that comes to mind, is to combine EncodingMetadata and the ByteBufferBsonOutput passed to CommandMessage.encode into something like DecodableByteBufferBsonOutput, and remove RequestMessage.encodingMetadata. Depending on the reasons behind the current design, there may be a reason why the proposed approach couldn't work.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all goes a long way back, to when there were many more subclasses of RequestMessage (we used to have subclasses for OP_INSERT, OP_UPDATE, OP_DELETE, etc). But now, there is only CommandMessage, and except for the annoyance of having to use OP_QUERY for the very first message we send on a connection, CommandMessage only encodes OP_MSG. Which makes EncodingMetadata#getFirstDocumentPosition effectively a constant.

When we removed support for all the other op codes, we simplified a lot, but there is more that can be done. Just not sure it should be done in this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I now understand why it was varying previously, but not the reasons behind the design that I described previously. I created https://jira.mongodb.org/browse/JAVA-5554 about improving this.

@jyemin jyemin marked this pull request as ready for review July 20, 2024 00:38
@jyemin jyemin requested a review from stIncMale July 20, 2024 00:38

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
…tCommandDocument

JAVA-5536
jyemin added 2 commits July 20, 2024 13:06

while (outputByteBuf.hasRemaining()) {
outputByteBuf.position(outputByteBuf.position() + 1 /* payload type */ + 4 /* payload size */);
String payloadName = getPayloadName(outputByteBuf);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, I now understand what is so confusing here:

  • SplittablePayload.getPayloadName is only supposed to be used when encoding (so it's not surprising that the method is still used), and obviously should not be used when decoding. Its previous usage for decoding appears to having been an attempt to simplify the small piece of the decoding code at the cost of abusing SplittablePayload.getPayloadName and making the whole picture less clear.
  • It is also unclear if CommandMessage.getCommandDocument should be an instance method of CommandMessage (more about this below). If it were not an instance method of CommandMessage, then it would not have been so easy to access SplittablePayload.getPayloadName from CommandMessage.getCommandDocument, and SplittablePayload.getPayloadName likely would not have been used by CommandMessage.getCommandDocument to begin with.

CommandMessage.getCommandDocument requires its argument to be not just any ByteBufferBsonOutput with a command encoded in the right format, but the one that was filled by the CommandMessage.encode method called on the same instance of CommandMessage the last time CommandMessage.encode was called. I am not sure what code design principle this violates, but it definitely violates something, makes the code fragile and difficult to think about.

  1. The only hard reason for CommandMessage.getCommandDocument to be an instance method of CommandMessage, is that it needs information on where in bsonOutput (its argument) to start decoding from.
  2. CommandMessage remembers the document's position in RequestMessage.encodingMetadata when CommandMessage.encode is called (each encode overrides the previously remembered position), and CommandMessage.getCommandDocument uses that remembered position.

I wonder, why it was designed this way? An immediate "fix" that comes to mind, is to combine EncodingMetadata and the ByteBufferBsonOutput passed to CommandMessage.encode into something like DecodableByteBufferBsonOutput, and remove RequestMessage.encodingMetadata. Depending on the reasons behind the current design, there may be a reason why the proposed approach couldn't work.

jyemin added 3 commits July 25, 2024 20:09
@jyemin jyemin requested a review from stIncMale July 26, 2024 01:10
@jyemin jyemin merged commit 14fc2fa into mongodb:master Jul 26, 2024
57 of 59 checks passed
@jyemin jyemin deleted the JAVA-5536 branch July 26, 2024 20:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants