Guideline for design: unixfsv2 as a "view"

Let a SerializableView be defined as a function which yielded out an ordered series of bytes.

Given any SerializableView, we can apply a hashing function.  Given these together, we can define both an equality predicate and a content addressable storage system.

IPLD already has one clearly defined SerializableView (though we don't call it such), and we hash this and use this hash... everywhere.  It's pretty useful.

Let's lift and generalize this concept.  The SerializableView of IPLD that we use for content-addressing hashes is certainly useful.  However, since we use the hash of this view as the index for our storage and lookup systems, it quickly becomes the bearer of a bunch of burdens.  For example, the choice of layout when we import large pieces of data (e.g., balanced tree vs trickle tree; variations in parameters for rabin chunking; etc) causes the hash of a tree of IPLD objects representing a large piece of data to vary.  This is fine for the storage and lookup systems; it also makes this hash unusable for a lot of other purposes, such as a useful equality check when we don't care about the chunking and layout.

We should define a SerializableView (and thus a hash that's usable as a cheap equality predicate) for unixfsv2 which is *not* bound to the IPLD hash.

We can define other SerialableViews as well.  One awesome applied example of this is a system by @mib-kd743naq which imports tars into IPFS and produces both a unixfsv1 tree... *and* a parallel tree of objects which can be cat'ed out to reproduce *the original tar precisely*.  This is spectacularly useful because this view of bytes can be piped into a hashing function to match `original.tar.sha256` or be verifiable against `original.tar.asc`.  This tar-reemitter tree also reuses the vast majority of blob objects as the unixfsv1 tree it was produced with, which is nice (and generally, when implementing more new views, we can probably do this quite often as well -- and let's keep an eye out for making this easy as we design).

In summary:
- A SerializableView of unixfsv2 which can be used for content-equality checks (and ignores chunking details) is desirable.
- Making sure we can have SerializableView which (e.g.) reproduces a tar precisely can coexist nicely and shares objects with unixfsv2 is a good heuristic for a good design.
- This seems to shake out as parallel trees for the metadata, and the both point in to trees of content blobs.  (Some of these parallel trees have directories (unixvsf2); some are very different (e.g. tar which... doesn't necessarily exactly have directories per se, and does have a bunch of other stuff).)  This should also probably inform our API design.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Guideline for design: unixfsv2 as a "view" #10

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Guideline for design: unixfsv2 as a "view" #10

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions