Default raw leaves to true #50

achingbrain · 2020-01-16T11:01:57Z

Builds on #49. A separate PR as it's slightly contentious.

Sets default DAG construction to be a dag-pb root node, dag-pb intermediate nodes and ipld-raw nodes for leaves. This will make parsing ever so slightly faster and DAG sizes ever so slightly smaller as there is no protobuf wrapper for the actual file data.

Currently you may end up with ipld-raw leaves or dag-pb leaves that contain UnixFS entries with type 'file' or 'raw' depending on where the importer is invoked from.

E.g. to generate the same CIDs as go-IPFS, ipfs.add will result in a balanced DAG with UnixFS leaf nodes with a type 'file', ipfs.files.write will result in a trickle DAG with UnixFS leaf nodes of a type raw, and specifying CID version 1 will get you ipld-raw leaf nodes and whatever tree strategy you specifed, default balanced. I believe the reason for this is largely historical, go-IPFS has different importers for different subsystems written by different people at different times when different bits of IPLD were available.

I think this is chaos and we should go a step further than this PR - we should use ipld-raw leaf types everywhere and only offer options to change the DAG structure, not leaf types.

BREAKING CHANGE:

options.rawLeaves now defaults to true

This is a `unixfs-importer` but it's possible to output CIDs that resolve to `dag-raw` nodes if your data is small and you set `rawLeaves` and `reduceSingleLeafToSelf` to true. In that case you'll get a `dag-raw` node as the output. This makes it impossible to set metadata on small files with `rawLeaves` set to true. BREAKING CHANGE: If your data is below the chunk size, and you have `rawLeaves` and `reduceSingleLeafToSelf` set to true, you'll get a CID that resolves to a bona fide UnixFS file back with metadata and all that good stuff instead of a `dag-raw` node.

Builds on #49. A separate PR as it's slightly contentious. Sets default DAG construction to be a `dag-pb` root node, `dag-pb` intermediate nodes and `ipld-raw` nodes for leaves. This will make parsing ever so slightly faster and DAG sizes ever so slightly smaller as there is no protobuf wrapper for the actual file data. Currently you may end up with `ipld-raw` leaves or `dag-pb` leaves that contain UnixFS entries with type 'file' or 'raw' depending on where the importer is invoked from. E.g. to generate the same CIDs as go-IPFS, `ipfs.add` will result in a balanced DAG with UnixFS leaf nodes with a type 'file', `ipfs.files.write` will result in a trickle DAG with UnixFS leaf nodes of a type `raw`, and specifying CID version 1 will get you `ipld-raw` leaf nodes and whatever tree strategy you specifed, default balanced. I think this is chaos, we should use `ipld-raw` leaf types everywhere and only offer options to change the DAG structure, not leaf types.

ribasushi · 2020-01-16T16:01:34Z

... with type 'file' or 'raw' depending on ...

Archeological question: what is the rationale for using the raw PB type? The encoding size is the same, but you can always seamlessly compose "files" into "larger files", while a raw node is not usable standalone.

achingbrain · 2020-01-16T16:10:07Z

I'm not sure, it predates my time on the project. My gut feeling is that it's from a time before ipld-raw and CIDv1. Maybe @Stebalien knows more?

Stebalien · 2020-01-16T21:13:29Z

I believe the raw dag-pb unixfs type was supposed to be for file leaves. And yes, that came before IPLD.

Stebalien · 2020-01-16T21:19:07Z

We chose not to make this the default in go-ipfs to avoid changing CIDs. That is, users expect that ipfs add -r ... will always produce the same CID when run on the same directory.

go-ipfs does default to raw leaves when --cid-version=1 is passed as that changes CIDs anyways.

The plan is to switch all the defaults (cidv1, raw leaves, etc.) all at once, ideally when we get UnixFSv2. Until then, doing this piece-wise is likely to cause quite a bit of noise and annoying issues.

One solution would be to introduce an "importer version" flag. Each time we introduce a fancy new importer feature, we'd define a new set of import options and assign an "importer version" to it. Users would be able to pass --importer=latest to get the latest and greatest features (cidv1, raw leaves, better chunker, inline CIDs, etc.).

mikeal · 2020-01-16T22:28:30Z

Just for clarity, does this default actually persist into js-ipfs or is this being set every time by js-ipfs‘s use of the importer?

If the defaults here are always being overwritten by js-ipfs I think it makes the discussion about defaults for this library a bit clearer and easier.

achingbrain · 2020-01-31T18:29:58Z

In theory js-IPFS sets the option every time, but some may lack defaults so get passed as undefined which merge-options will ignore if a default has been set.

Anyway, @Stebalien, @alanshaw, @lidel and I had a catch up about this and the way forward is this:

Do not make this the default because it'll change hashes.
Create import presets or versions that set defaults for various options - cid v1, raw nodes, graph construction etc to let users opt-in to shiny new features in a way that is repeatable and doesn't encourage massive fragmentation of CIDs for a given bit of content

achingbrain added 2 commits January 16, 2020 09:36

achingbrain requested review from alanshaw and Stebalien January 16, 2020 11:02

achingbrain closed this Feb 20, 2020

achingbrain deleted the default-raw-leaves-to-true branch February 20, 2020 23:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Default raw leaves to true #50

Default raw leaves to true #50

Uh oh!

achingbrain commented Jan 16, 2020 •

edited

Loading

Uh oh!

ribasushi commented Jan 16, 2020

Uh oh!

achingbrain commented Jan 16, 2020

Uh oh!

Stebalien commented Jan 16, 2020

Uh oh!

Stebalien commented Jan 16, 2020

Uh oh!

mikeal commented Jan 16, 2020

Uh oh!

achingbrain commented Jan 31, 2020

Uh oh!

Uh oh!

Default raw leaves to true #50

Default raw leaves to true #50

Uh oh!

Conversation

achingbrain commented Jan 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ribasushi commented Jan 16, 2020

Uh oh!

achingbrain commented Jan 16, 2020

Uh oh!

Stebalien commented Jan 16, 2020

Uh oh!

Stebalien commented Jan 16, 2020

Uh oh!

mikeal commented Jan 16, 2020

Uh oh!

achingbrain commented Jan 31, 2020

Uh oh!

Uh oh!

achingbrain commented Jan 16, 2020 •

edited

Loading