Skip to content

Proposal for dealing with fractional mtime #232

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

ribasushi
Copy link
Contributor

@ribasushi ribasushi commented Dec 18, 2019

(ab)use the repeated type to have an identical wire-level representation
without using too many field numbers ( we are already half way through the
15 easy-ones available )

(ab)use the repeated type to have an identical wire-level representation
without using too many field numbers ( we are already half wy through the
15 easy-ones available )
@@ -90,7 +90,12 @@ UnixFS currently supports two optional metadata fields:
- The remaining 20 bits are reserved for future use, and are subject to change. Spec implementations **MUST** handle bits they do not expect as follows:
- For future-proofing the (de)serialization layer must preserve the entire uint32 value during clone/copy operations, modifying only bit values that have a well defined meaning: `clonedValue = ( modifiedBits & 07777 ) | ( originalValue & 0xFFFFF000 )`
- Implementations of this spec must proactively mask off bits without a defined meaning in the implemented version of the spec: `interpretedValue = originalValue & 07777`
* `mtime` -- The modification time in seconds since the epoch. This defaults to the unix epoch if unspecified
* `mtime` -- The modification time in seconds since the epoch `1970-01-01T00:00:00Z`. The `repeated` protobuf type is overloaded to provide optional high resolution as follows:
- If no mtime value is available, an `undefined` mtime is assumed. Implementations are free to default to the epoch itself, or if appropriate render a variant of `N/A` to signify lack of a value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the spec should have an opinion about default values. It currently says:

This defaults to the unix epoch if unspecified

Any reason not to do this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@achingbrain this is the same issue you had with mode: we need a way to represent "no mtime has been supplied". For FUSE-like bridges this would mean either epoch or now() at time of mount, and we could very well define this in spec. But for the web-gateway it should remain a N/A or - or something like that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now() at time of mount would probably need storing somewhere which might be problematic, probably saver to opt for epoch.

But for the web-gateway it should remain a N/A or - or something like that.

Why is that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because... there is no mtime attached != Jan 1st 1970?

I guess I am not understanding the question, can you clarify?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I guess we're saying something like:

If the existence of mtime is important for your application, use 0 where it is not set

Seems fair enough. Seeing 1970 everywhere when doing ls -l style mfs directory listings is bugging me a little. I'd rather see - in it's place.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the issue is that:

  1. Operating systems expect some mtime and we want to be consistent.
  2. If we pick the epoch, the mtime will always be consistent. If we pick the mount time, it'll change every time we restart.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On that basis we should probably stick with 0 as the default mtime as it is now.

- If one mtime value is available, it represents the amount of seconds after or *before* the epoch. Implementations must be able to gracefully handle negative mtime, even if such a value is not applicable within their domain ( e.g. a POSIX filesystem )
- If two mtime values are available, the first one is interpreted as described above, and the second value represents the fractional part of the mtime as the amount of nanoseconds. The valid range for this value is the intger range `[1, 999999999]`.
If a fractional part outside of this range is encountered, implementations should consider the entire metadata block invalid and abort processing it. Note that **a fractional value of `0` is NOT valid**.
- If more than two repeated mtime values are available, implementations should consider the entire metadata block invalid and abort processing it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of having so much interpretation in a spec - the more an implementer has to understand in order to implement something, the more chances are of them getting it wrong.

Using an unbounded list for this also opens it up to abuse.

Why not just have two optional fields - mtime for seconds and mtime_nsec for nanoseconds since the last second? Both default to 0 and the combination of which gives you the time.

From what I can see the serialized size of such messages is the same:

`
message TwoFields {
  optional int32 mtime = 1;
  optional int32 mtime_nsec = 2;
}

message RepeatedField {
  repeated int32 mtime = 1;
}
`

console.info(TwoFields.encode({
  mtime: 1,
  mtime_nsec: 1
}).length) // 4

console.info(RepeatedField.encode({
  mtime: [1, 1]
}).length) // 4

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would vastly prefer something like:

message Data {
	enum DataType {
		Raw = 0;
		Directory = 1;
		File = 2;
		Metadata = 3;
		Symlink = 4;
		HAMTShard = 5;
	}

	required DataType Type = 1;
	optional bytes Data = 2;
	optional uint64 filesize = 3;
	repeated uint64 blocksizes = 4;
	optional uint64 hashType = 5;
	optional uint64 fanout = 6;
	optional uint32 mode = 7;
	optional TimeSpec mtime = 8;
}

message TimeSpec {
  required int64 EpochSeconds = 1;
  optional uint32 EpochNanoseconds = 2;
}

message Metadata {
	optional string MimeType = 1;
}

My understanding was that this was rejected for... reasons
If this is in fact acceptable - I will re-PR right away ;)

Copy link
Member

@achingbrain achingbrain Dec 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. The capitialisation of EpochSeconds is giving me the twitchy eye, but then so does Type, Data and MimeType. Not that we've been very good at following it but the Protocol Buffers style guide recommends snake_case for multi-word field names.

Do we actually need nanosecond precision or would milliseconds do?

Copy link
Contributor Author

@ribasushi ribasushi Dec 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The capitialisation of EpochSeconds is giving me the twitchy eye

I was simply following "what's already there"

Do we actually need nanosecond precision or would milliseconds do?

We need to take up 32 bits, at which point the difference between milli and nanosseconds is moot. Also Most of the filesystem supporting fractional times do in fact keep 9 decimal places ( likely for the same 32bit reason )

Ok I am going force-push this entire PR, and we can start over ;)

@ribasushi
Copy link
Contributor Author

Will open a new PR tomorrow, day got away from me.

@ribasushi ribasushi closed this Dec 19, 2019
achingbrain added a commit that referenced this pull request Dec 23, 2019
Stores modification times as two values seconds and nanoseconds since
or before the unix epoch.

The `EpochSeconds` field represents a fraction of a second rather
nanoseconds since the epoch.

Developed from mine and @ribasushi's conversation on #232 (comment)
@ribasushi ribasushi deleted the optional_hires_mtime branch January 28, 2020 17:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants