-
Notifications
You must be signed in to change notification settings - Fork 4
Conversation
This is a `unixfs-importer` but it's possible to output CIDs that resolve to `dag-raw` nodes if your data is small and you set `rawLeaves` and `reduceSingleLeafToSelf` to true. In that case you'll get a `dag-raw` node as the output. This makes it impossible to set metadata on small files with `rawLeaves` set to true. BREAKING CHANGE: If your data is below the chunk size, and you have `rawLeaves` and `reduceSingleLeafToSelf` set to true, you'll get a CID that resolves to a bona fide UnixFS file back with metadata and all that good stuff instead of a `dag-raw` node.
Builds on #49. A separate PR as it's slightly contentious. Sets default DAG construction to be a `dag-pb` root node, `dag-pb` intermediate nodes and `ipld-raw` nodes for leaves. This will make parsing ever so slightly faster and DAG sizes ever so slightly smaller as there is no protobuf wrapper for the actual file data. Currently you may end up with `ipld-raw` leaves or `dag-pb` leaves that contain UnixFS entries with type 'file' or 'raw' depending on where the importer is invoked from. E.g. to generate the same CIDs as go-IPFS, `ipfs.add` will result in a balanced DAG with UnixFS leaf nodes with a type 'file', `ipfs.files.write` will result in a trickle DAG with UnixFS leaf nodes of a type `raw`, and specifying CID version 1 will get you `ipld-raw` leaf nodes and whatever tree strategy you specifed, default balanced. I think this is chaos, we should use `ipld-raw` leaf types everywhere and only offer options to change the DAG structure, not leaf types.
Archeological question: what is the rationale for using the raw PB type? The encoding size is the same, but you can always seamlessly compose "files" into "larger files", while a |
I'm not sure, it predates my time on the project. My gut feeling is that it's from a time before |
I believe the raw dag-pb unixfs type was supposed to be for file leaves. And yes, that came before IPLD. |
We chose not to make this the default in go-ipfs to avoid changing CIDs. That is, users expect that go-ipfs does default to raw leaves when The plan is to switch all the defaults (cidv1, raw leaves, etc.) all at once, ideally when we get UnixFSv2. Until then, doing this piece-wise is likely to cause quite a bit of noise and annoying issues. One solution would be to introduce an "importer version" flag. Each time we introduce a fancy new importer feature, we'd define a new set of import options and assign an "importer version" to it. Users would be able to pass |
Just for clarity, does this default actually persist into If the defaults here are always being overwritten by |
In theory Anyway, @Stebalien, @alanshaw, @lidel and I had a catch up about this and the way forward is this:
|
Builds on #49. A separate PR as it's slightly contentious.
Sets default DAG construction to be a
dag-pb
root node,dag-pb
intermediate nodes andipld-raw
nodes for leaves. This will make parsing ever so slightly faster and DAG sizes ever so slightly smaller as there is no protobuf wrapper for the actual file data.Currently you may end up with
ipld-raw
leaves ordag-pb
leaves that contain UnixFS entries with type 'file' or 'raw' depending on where the importer is invoked from.E.g. to generate the same CIDs as go-IPFS,
ipfs.add
will result in a balanced DAG with UnixFS leaf nodes with a type 'file',ipfs.files.write
will result in a trickle DAG with UnixFS leaf nodes of a typeraw
, and specifying CID version 1 will get youipld-raw
leaf nodes and whatever tree strategy you specifed, default balanced. I believe the reason for this is largely historical, go-IPFS has different importers for different subsystems written by different people at different times when different bits of IPLD were available.I think this is chaos and we should go a step further than this PR - we should use
ipld-raw
leaf types everywhere and only offer options to change the DAG structure, not leaf types.BREAKING CHANGE:
options.rawLeaves
now defaults totrue