From b6ca8087fd7e447a83dc57a0017888786278059f Mon Sep 17 00:00:00 2001 From: Steven Allen Date: Tue, 16 Feb 2021 21:42:00 -0800 Subject: [PATCH 1/3] Support Large IPLD/IPFS DAGs --- proposals/large-ipld-dags.md | 116 +++++++++++++++++++++++++++++++++++ 1 file changed, 116 insertions(+) create mode 100644 proposals/large-ipld-dags.md diff --git a/proposals/large-ipld-dags.md b/proposals/large-ipld-dags.md new file mode 100644 index 00000000..19da7bef --- /dev/null +++ b/proposals/large-ipld-dags.md @@ -0,0 +1,116 @@ +# Support Large IPLD/IPFS DAGs + +Authors: Stebalien + +Initial PR: TBD + +## Purpose & impact +#### Background & intent +_Describe the desired state of the world after this project? Why does that matter?_ + +First, it should be possible to store arbitrary and arbitrarily large IPLD DAGs on Filecoin using +the built-in protocols. At the moment, Filecoin can only store "whole DAGs". If a DAG, doesn't fit +into a sector when serialized as a CAR, it must be converted to raw-blocks, chunked, and then stored +as those chunks. + +Unfortunately: + +1. This workaround erases the underlying DAG structure. This makes it difficult to transfer this + data for both storage and retrieval. This is especially true when interacting with IPFS. +2. This workaround requires storing an "overlay" DAG in Filecoin (paying for that storage). + +Second, it should be possible to retrieve subsets of DAGs. While the underlying protocols support +retrieving subsets of DAGs, the CLI does not. This makes it impossible to, e.g., retrieve a single +file from a directory without modifying Lotus. + +#### Assumptions & hypotheses +_What must be true for this project to matter?_ + +There is no easy way (e.g., no out-of-band deals) to store large (> sector size) IPLD DAGs while +preserving the DAG structure. + +#### User workflow example +_How would a developer or user use this new capability?_ + +* `lotus client deal` should accept an IPLD selector. +* `lotus cleint deal` should automatically split large DAGs between multiple sectors. +* `lotus client retrieve` should support retrieving IPLD selectors (dag subsets). + +#### Impact +_How directly important is the outcome to web3 dev stack product-market fit?_ + +🔥 + +At the moment, any tool wishing to support storing IPFS files/directories larger than 32GiB will need to store these IPFS files/directories as "raw blocks", throwing away all the DAG structural information. This will make future retrieval deals for subsets of this data infeasible and will make IPFS interop extremely difficult. + +This is only one 🔥 because there are plenty of useful sub-32GiB datasets and non-IPFS datasets. + +#### Leverage +_How much would nailing this project improve our knowledge and ability to execute future projects?_ + +🎯🎯🎯 + +If we don't solve this now, users will likely store large DAGs any way they can (e.g., as raw +blocks). We could end up with a lot of unfortunately structured data in Filecoin that's difficult to +retrieve and work with, especially from IPFS. + +#### Confidence +_How sure are we that this impact would be realized? Label from [this scale](https://medium.com/@nimay/inside-product-introduction-to-feature-priority-using-ice-impact-confidence-ease-and-gist-5180434e5b15)_. + +?? + +## Project definition +#### Brief plan of attack + +1. Implement selector support in `lotus client deal`. +2. Implement selector support in `lotus client retrieve`. +3. Support automatically splitting large dags into across deals in `lotus client retrieve`. + +#### What does done look like? +_What specific deliverables should completed to consider this project done?_ + +All three of the above commands have been implemented. + +NOTE: stopping anywhere along the way will yield a useful result. As long as the first step is finished (selector support for `lotus client deal`) the + +#### What does success look like? +_Success means impact. How will we know we did the right thing?_ + +1. Developers can easily store large directory trees on Filecoin. +2. Developers can easily retrieve individual files from large datasets on Filecoin. + +#### Counterpoints & pre-mortem +_Why might this project be lower impact than expected? How could this project fail to complete, or fail to be successful?_ + +The primary risk is that there may be a lack of demand to store large IPFS-formatted datasets in Filecoin. That is, users storing large datasets (> 32GiB) may all be using custom formats and may not care about IPFS files/directories, partial retrieval, etc. + +Another risk is that the IPLD selector language may be insufficient to describe useful selectors over IPFS data. It should be at least possible to _store_ DAG subsets using IPLD selectors, but we may need new selectors to, e.g., download individual IPFS files. + +#### Alternatives +_How might this project’s intent be realized in other ways (other than this project proposal)? What other potential solutions can address the same need?_ + +1. Don't support datasets > 32GiB. +2. Store large datasets as raw objects instead of IPFS files and accept the fact that these datasets + will be difficult to query/retrieve from IPFS. + +#### Dependencies/prerequisites + + +None. + +#### Future opportunities + + +* Large IPFS datasets. +* IPFS interop. + +## Required resources + +#### Effort estimate + +Small to medium. + +#### Roles / skills needed + +* Markets (ideally Hannah or Dirk). +* IPLD/Selectors (Riba or Eric). From d04fb9e953973cf64495a2ff3132c28844f7cf54 Mon Sep 17 00:00:00 2001 From: Steven Allen Date: Wed, 17 Feb 2021 09:07:17 -0800 Subject: [PATCH 2/3] Apply suggestions from code review Co-authored-by: Vasco Santos Co-authored-by: Marcin Rataj --- proposals/large-ipld-dags.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/proposals/large-ipld-dags.md b/proposals/large-ipld-dags.md index 19da7bef..bbe8c256 100644 --- a/proposals/large-ipld-dags.md +++ b/proposals/large-ipld-dags.md @@ -33,7 +33,7 @@ preserving the DAG structure. _How would a developer or user use this new capability?_ * `lotus client deal` should accept an IPLD selector. -* `lotus cleint deal` should automatically split large DAGs between multiple sectors. +* `lotus client deal` should automatically split large DAGs between multiple sectors. * `lotus client retrieve` should support retrieving IPLD selectors (dag subsets). #### Impact @@ -78,6 +78,7 @@ _Success means impact. How will we know we did the right thing?_ 1. Developers can easily store large directory trees on Filecoin. 2. Developers can easily retrieve individual files from large datasets on Filecoin. +3. Snapshots of English Wikipedia can be stored on Filecoin. #### Counterpoints & pre-mortem _Why might this project be lower impact than expected? How could this project fail to complete, or fail to be successful?_ From ccc481a224589ec2142a7477471c7e750eb7bff1 Mon Sep 17 00:00:00 2001 From: Steven Allen Date: Wed, 17 Feb 2021 16:57:37 -0800 Subject: [PATCH 3/3] finish sentence --- proposals/large-ipld-dags.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/large-ipld-dags.md b/proposals/large-ipld-dags.md index bbe8c256..e90767d6 100644 --- a/proposals/large-ipld-dags.md +++ b/proposals/large-ipld-dags.md @@ -71,7 +71,7 @@ _What specific deliverables should completed to consider this project done?_ All three of the above commands have been implemented. -NOTE: stopping anywhere along the way will yield a useful result. As long as the first step is finished (selector support for `lotus client deal`) the +NOTE: stopping anywhere along the way will yield a useful result. As long as the first step is finished (selector support for `lotus client deal`), we'll be able to store large structured IPFS data on-chain. #### What does success look like? _Success means impact. How will we know we did the right thing?_