-
-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Description
This might be an extreme case, but consider someone wishing to make Wikipedia available on IPFS:
$ zcat dumps.wikimedia.org/enwiki/20150901/enwiki-20150901-all-titles-in-ns0.gz | wc -l
11944438
It seems like a bad idea to have a merkledag node with 12M links, but that would be the representation using unixfs Data_Directory
. It seems like a similarly bad idea to have a merkledag node past even 1k links, and directories with a thousand files occur more commonly in practice.
Alternate directory representations (Data_PrefixTreeDirectory
and/or Data_HashTableDirectory
) might be a solution, using intermediate merkledag nodes to ultimately reference all of a large directory's children in a way that permits efficient traversal and selective retrieval. The distinction between directory representations could be transparent to users, with implementations free to choose whichever data structure it deems suitable.
Going back to the example: the list of Wikipedia pages is 67 MB gzipped, before adding hashes. A user shouldn't have to download ~400 MB of protobufs just to find the hash of a single article page.
What's a sensible upper limit for a merkledag node's link count or encoded size in bytes? Is there precedent for reducing merkledag fan out? What other components would need to know about a new unixfs DataType
?