-
Notifications
You must be signed in to change notification settings - Fork 235
Proposal for dealing with fractional mtime #232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
(ab)use the repeated type to have an identical wire-level representation without using too many field numbers ( we are already half wy through the 15 easy-ones available )
@@ -90,7 +90,12 @@ UnixFS currently supports two optional metadata fields: | |||
- The remaining 20 bits are reserved for future use, and are subject to change. Spec implementations **MUST** handle bits they do not expect as follows: | |||
- For future-proofing the (de)serialization layer must preserve the entire uint32 value during clone/copy operations, modifying only bit values that have a well defined meaning: `clonedValue = ( modifiedBits & 07777 ) | ( originalValue & 0xFFFFF000 )` | |||
- Implementations of this spec must proactively mask off bits without a defined meaning in the implemented version of the spec: `interpretedValue = originalValue & 07777` | |||
* `mtime` -- The modification time in seconds since the epoch. This defaults to the unix epoch if unspecified | |||
* `mtime` -- The modification time in seconds since the epoch `1970-01-01T00:00:00Z`. The `repeated` protobuf type is overloaded to provide optional high resolution as follows: | |||
- If no mtime value is available, an `undefined` mtime is assumed. Implementations are free to default to the epoch itself, or if appropriate render a variant of `N/A` to signify lack of a value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the spec should have an opinion about default values. It currently says:
This defaults to the unix epoch if unspecified
Any reason not to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@achingbrain this is the same issue you had with mode
: we need a way to represent "no mtime has been supplied". For FUSE-like bridges this would mean either epoch
or now() at time of mount
, and we could very well define this in spec. But for the web-gateway it should remain a N/A
or -
or something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now() at time of mount
would probably need storing somewhere which might be problematic, probably saver to opt for epoch
.
But for the web-gateway it should remain a N/A or - or something like that.
Why is that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because... there is no mtime attached
!= Jan 1st 1970
?
I guess I am not understanding the question, can you clarify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I guess we're saying something like:
If the existence of
mtime
is important for your application, use0
where it is not set
Seems fair enough. Seeing 1970
everywhere when doing ls -l
style mfs directory listings is bugging me a little. I'd rather see -
in it's place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the issue is that:
- Operating systems expect some mtime and we want to be consistent.
- If we pick the epoch, the mtime will always be consistent. If we pick the mount time, it'll change every time we restart.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On that basis we should probably stick with 0
as the default mtime as it is now.
- If one mtime value is available, it represents the amount of seconds after or *before* the epoch. Implementations must be able to gracefully handle negative mtime, even if such a value is not applicable within their domain ( e.g. a POSIX filesystem ) | ||
- If two mtime values are available, the first one is interpreted as described above, and the second value represents the fractional part of the mtime as the amount of nanoseconds. The valid range for this value is the intger range `[1, 999999999]`. | ||
If a fractional part outside of this range is encountered, implementations should consider the entire metadata block invalid and abort processing it. Note that **a fractional value of `0` is NOT valid**. | ||
- If more than two repeated mtime values are available, implementations should consider the entire metadata block invalid and abort processing it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of having so much interpretation in a spec - the more an implementer has to understand in order to implement something, the more chances are of them getting it wrong.
Using an unbounded list for this also opens it up to abuse.
Why not just have two optional fields - mtime
for seconds and mtime_nsec
for nanoseconds since the last second? Both default to 0
and the combination of which gives you the time.
From what I can see the serialized size of such messages is the same:
`
message TwoFields {
optional int32 mtime = 1;
optional int32 mtime_nsec = 2;
}
message RepeatedField {
repeated int32 mtime = 1;
}
`
console.info(TwoFields.encode({
mtime: 1,
mtime_nsec: 1
}).length) // 4
console.info(RepeatedField.encode({
mtime: [1, 1]
}).length) // 4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would vastly prefer something like:
message Data {
enum DataType {
Raw = 0;
Directory = 1;
File = 2;
Metadata = 3;
Symlink = 4;
HAMTShard = 5;
}
required DataType Type = 1;
optional bytes Data = 2;
optional uint64 filesize = 3;
repeated uint64 blocksizes = 4;
optional uint64 hashType = 5;
optional uint64 fanout = 6;
optional uint32 mode = 7;
optional TimeSpec mtime = 8;
}
message TimeSpec {
required int64 EpochSeconds = 1;
optional uint32 EpochNanoseconds = 2;
}
message Metadata {
optional string MimeType = 1;
}
My understanding was that this was rejected for... reasons
If this is in fact acceptable - I will re-PR right away ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. The capitialisation of EpochSeconds
is giving me the twitchy eye, but then so does Type
, Data
and MimeType
. Not that we've been very good at following it but the Protocol Buffers style guide recommends snake_case
for multi-word field names.
Do we actually need nanosecond precision or would milliseconds do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The capitialisation of EpochSeconds is giving me the twitchy eye
I was simply following "what's already there"
Do we actually need nanosecond precision or would milliseconds do?
We need to take up 32 bits, at which point the difference between milli and nanosseconds is moot. Also Most of the filesystem supporting fractional times do in fact keep 9 decimal places ( likely for the same 32bit reason )
Ok I am going force-push this entire PR, and we can start over ;)
Will open a new PR tomorrow, day got away from me. |
Stores modification times as two values seconds and nanoseconds since or before the unix epoch. The `EpochSeconds` field represents a fraction of a second rather nanoseconds since the epoch. Developed from mine and @ribasushi's conversation on #232 (comment)
(ab)use the repeated type to have an identical wire-level representation
without using too many field numbers ( we are already half way through the
15 easy-ones available )