-
Notifications
You must be signed in to change notification settings - Fork 8
Remove protobufjs in favor of protons-runtime #60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@talentlessguy you might gain more savings switching to protons and protons-runtime instead? It would dudupe with |
Note: I think in general I'd prefer to switch to |
thanks for the suggestion |
FWIW, last time I ran the benchmark suite, protons was just over 20% faster than protobuf-es when serializing/deserializing - https://github.com/ipfs/protons/tree/main/packages/protons-benchmark#usage These kind of performance characteristics are very important to high traffic deployments like Lodestar so it's unlikely to be switched away from elsewhere. |
do you mean protobufjs, and not protons? |
I wonder how hard it would be to replace protobuf-es with protons, will give it a shot rn |
I guess the only thing is that protons compiles protobuf definitions to TypeScript and this is still a js project. |
@achingbrain is the benchmark encode or decode? We're only encode in this repo. |
src/codec.js
Outdated
Data: content.byteLength > 0 ? content : undefined, | ||
filesize: content.byteLength, | ||
Data: content.byteLength > 0 ? content : EMPTY_BUFFER, | ||
filesize: BigInt(content.byteLength), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the dag-pb encoder support bigint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about dag-pb, is it relevant to the PR?
Data: content.byteLength > 0 ? content : undefined, | ||
filesize: content.byteLength, | ||
Data: content.byteLength > 0 ? content : EMPTY_BUFFER, | ||
filesize: content.length === 0 ? Object.assign(0n, { __forceEncode: true }) : BigInt(content.length), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need __forceEncode
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding these undefined
fields and setting __forceEncode
seems to be repeated often - shouldn't it be moved to encodePB
so we get less change in other places?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is because protons-runtime ignores 0n because it thinks it's a default and because of that doesn't include it in the protobuf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Protons is a proto3 implementation though it tries it's best with proto2. The big difference is there's no "required" modifier any more because the authors considered it harmful.
If you want a value to always be in the protobuf in proto3 you should mark it optional
and set a value. This is compatible on the wire with required
in proto2.
E.g. these will result in the same bytes:
syntax = "proto2";
message Derp {
required int64 Value = 1;
}
syntax = "proto3";
message Derp {
optional int64 Value = 1;
}
You can see the proto3
definition ipfs-unixfs
uses here and the bytes generated are compatible with older proto2
parsers.
There was some discussion about this on the libp2p specs repo a while back which goes over the various pros and cons and how to maintain backwards compatibility - libp2p/specs#465
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what's the best solution? migrating js-unixfs to use proto3 or just keep it the way it is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, there is a lot of change in this PR that makes me nervous about merging it and retaining compatibility.
Would more comprehensive tests make you more comfortable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests only change numbers to bigint's, the rest didn't change. that's the only breaking change actually, I made sure not to alter anything else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@achingbrain I'm not sure why but protons with proto3 are unable to properly encode 0-ish values:
/**
*
* @param {Uint8Array} content
* @returns {UnixFS.ByteView<UnixFS.Raw>}
*/
export const encodeRaw = content =>
encodePB(
{
Type: NodeType.Raw,
Data: content,
filesize: BigInt(content.length),
// // @ts-ignore
blocksizes: [],
fanout: 0n,
mode: 0,
hashType: 0n,
},
[]
)
Produces:
1) format neunaces
raw with no content:
AssertionError: expected Uint8Array[ 10, 0 ] to deeply equal Uint8Array[ 10, 4, 8, 0, 24, 0 ]
+ expected - actual
-{"0":10,"1":0}
+{"0":10,"1":4,"2":8,"3":0,"4":24,"5":0}
This used to work with proto2:
export const encodeRaw = content =>
encodePB(
{
Type: NodeType.Raw,
Data: content.byteLength > 0 ? content : EMPTY_BUFFER,
filesize: content.length === 0 ? Object.assign(0n, { __forceEncode: true }) : BigInt(content.length),
// @ts-ignore
blocksizes: EMPTY,
fanout: 0n,
},
[]
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Protobuf has the concept of default values, for example if a field type is uint64
, and no value for it is present in the protobuf binary, on deserialization it will be set to the default value of 0n
, unless the field is marked optional
, in which case it will be set to undefined
.
If a field is not marked optional
(e.g. it is singular in proto3 nomenclature), and is set to the default value, it will be omitted from the protobuf binary.
Consequently values for optional
fields will be written into the protobuf, even if they are equal to the default value otherwise later it would be impossible to know if the field in question was explicitly intended to be the default value or if it was intended to be omitted from the message.
Above you are setting values for optional
fields which is why there are more bytes in the serialized form of the message than you are expecting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Above you are setting values for optional fields which is why there are more bytes in the serialized form of the message than you are expecting.
not sure if I follow, the "expected" diff has these bytes: [ 10, 4, 8, 0, 24, 0 ]
, while the actual has only two, so thats' the exact opposite to what you have said. Should I change something in the proto declaration maybe to work around this?
This closes #31
Switches to protons-runtime
All tests pass now