This repository was archived by the owner on Jun 2, 2020. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 56
New IPFS explainer #170
Merged
Merged
New IPFS explainer #170
Changes from all commits
Commits
Show all changes
79 commits
Select commit
Hold shift + click to select a range
358130b
Initial commit
meiqimichelle e7dc7d4
Add FPO images for draft new content
meiqimichelle 0a84f3e
Add v1 intro and DHT explainer
meiqimichelle e78c5c9
Another version of the new content, still WIP
meiqimichelle 6e45808
Finished v1 new content
meiqimichelle ee6c025
Merge latest edits to upstream branch 'fix/copyedit-ipfs-intro' into …
meiqimichelle df75cd7
A few quick edits to give more instal options in getting started.
meiqimichelle 9d27faa
Change order of intro sections. Update overview next steps to reflect…
meiqimichelle 4d6a2f7
Remove extra 'complex'
meiqimichelle 2947e3f
Removes unecessary 'you could also...'
meiqimichelle d9e44fe
Rework explanation of content addressed p2p network
meiqimichelle da44b2a
Identity --> identify
meiqimichelle 606a39f
Rework hash paragraph based on Hector's suggestions
meiqimichelle 71e281e
Copyedit hash interoperability paragraph
meiqimichelle 9e50177
Replace all dumb quotes with smart quotes
meiqimichelle fb19cf8
Rework IPFS linked data paragraph with some Oli style
meiqimichelle 76d71d8
Rework and accept edits to first DAG paragraph
meiqimichelle 2f88691
Combine Hector/Oli edits to DAG sharing
meiqimichelle af8ccfe
Rework DHT paragraph
meiqimichelle 288e2d3
Incorporates edits to multiplexing paragraph
meiqimichelle 3092c45
Add and edit changes to why multiplexing is useful paragraph
meiqimichelle 1478b6c
Fix link to Aardvark
meiqimichelle 8eeffd2
Fix IPFS link to Aardvark
meiqimichelle 24fd26d
Add parenthetical reference to cray cray hash
meiqimichelle edd891d
Pulls two new copyedit commits into local from remote
meiqimichelle 7d6d867
Fix Aardvark link
meiqimichelle 829f40e
Fix Aardvark link
meiqimichelle 86cfc6e
Edit to IPFS IPLD paragraph
meiqimichelle 34433b6
Makes hash sentence cleaner
meiqimichelle decff14
Server-client --> client-server
meiqimichelle 241d392
Incorpoartes Hector's edits to summary
meiqimichelle 2a61c70
Incorporate Oli edits to DAG structure
meiqimichelle fb3746f
Incorporate Oli's edits to chunking
meiqimichelle 1fd3067
Incorporate Oli edit to recap re: DAG and IPLD
meiqimichelle 2325c00
More dumb quotes --> smart quotes.
meiqimichelle abf146c
Rework IPLD paragraph to focus more on 'traverse' rather than 'transl…
meiqimichelle e71cc55
Incorporate Hector's edits to chunking paragraph
meiqimichelle 77972dc
Remove references to network stack; no longer helpful
meiqimichelle e655dc0
Remove first heading. Improve CID section.
meiqimichelle efc537b
Initial improvements to DAG section
meiqimichelle aad4a53
Add concept doc on Merkle-DAGs. Words from Merkle-CRDT paper.
meiqimichelle 12a1b5a
Reworks DAG section.
meiqimichelle 0b1ebe2
Final copyedit before further review
meiqimichelle 8041639
Comments out placeholder images. They are distracting.
meiqimichelle d716671
Fix link in content/introduction/usage.md
lidel 2aa10f5
Remove commented out images. They're not helpful, and we won't be usi…
meiqimichelle 0f7942a
Changes Merkle-DAG example from comma to website file, as per @olizilla
meiqimichelle 4e758be
A few edits to the bit that says database
meiqimichelle 538337e
IPFS project --> libp2p as per @hsanjuan
meiqimichelle 3756a1a
Adds Merkle to last paragraph
meiqimichelle 659fa2d
Title --> often title
meiqimichelle b4f174e
Connectivity --> connection, as per @momack2
meiqimichelle 9c414cd
Provides better link to Merkle-DAG paper, as per @lanzafame
meiqimichelle 0074a9f
Remove some of the most informal words and structures. Too many excla…
meiqimichelle ce5b46a
IPLD link --> translate
meiqimichelle 1482617
Remove efficiency claim from block paragraph
meiqimichelle 9995c72
Add info on bitswap, ht @momack2 and @hsanjuan
meiqimichelle 788b3ca
Simplify first sentence
meiqimichelle 0d426c4
Spelling fix and much more --> more
meiqimichelle 34c5379
Add link to DNSLink concept guide
meiqimichelle a4baa75
Add sentence about not being able to remove content from current web
meiqimichelle b217dd6
Edit to finding content sentence to make it sound less like you'll be…
meiqimichelle e61efaf
Your --> the, and removes more exclamation points
meiqimichelle b11dc2c
Remove TODO re expanding modularization section for now
meiqimichelle 315b330
Comments out final paragraph for now, because we haven't written thos…
meiqimichelle b32647d
Merge branch 'master' into feature/new-ipfs-explainer
meiqimichelle 3415430
Quick copyedit of overview.md to soften some statements, as per @hsan…
meiqimichelle 915a3fb
Merge branch 'feature/new-ipfs-explainer' of https://github.com/ipfs/…
meiqimichelle 5234572
Removes fun
meiqimichelle 95a3ab5
Update overview.md
jessicaschilling 3685309
Update how-ipfs-works.md
jessicaschilling 544aa0f
Update how-ipfs-works.md
jessicaschilling 76add37
Update overview.md
jessicaschilling a29da06
Update overview.md
jessicaschilling 672126b
Update how-ipfs-works.md
jessicaschilling eb2d6bf
Update how-ipfs-works.md
jessicaschilling 16d3b36
Update how-ipfs-works.md
jessicaschilling 72b8e8c
Update how-ipfs-works.md
jessicaschilling 864a892
Update how-ipfs-works.md
jessicaschilling File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
--- | ||
title: "Merkle-DAGs" | ||
menu: | ||
guides: | ||
parent: concepts | ||
--- | ||
|
||
A _Direct Acyclic Graph_ (DAG)is a type of graph in which edges have direction and cycles are not allowed. For example, a linked list like _A→B→C_ is an instance of a DAG where _A_ references _B_ and so on. We say that _B_ is _a child_ or _a descendant of A_, and that _node A has a link to B_. Conversely _A_ is a _parent of B_. We call nodes that are not children to any other node in the DAG _root nodes_. | ||
|
||
A Merkle-DAG is a DAG where each node has an identifier and this is the result of hashing the node’s contents — any opaque payload carried by the node and the list of identifiers of its children — using a cryptographic hash function like SHA256. This brings some important considerations: | ||
|
||
1. Merkle-DAGs can only be constructed from the leaves, that is, from nodes without children. Parents are added after children because the children’s identifiers must be computed in advance to be able to link them. | ||
1. every node in a Merkle-DAG is the root of a (sub)Merkle-DAG itself, and this subgraph is _contained_ in the parent DAG[9]. | ||
1. Merkle-DAG nodes are _immutable_. Any change in a node would alter its identifier and thus affect all the ascendants in the DAG, essentially creating a different DAG. Take a look at [this helpful illustration using bananas](https://media.consensys.net/ever-wonder-how-merkle-trees-work-c2f8b7100ed3) from our friends at Consensys. | ||
|
||
Identifying a data object (like a Merkle-DAG node) by the value of its hash is referred to as _content addressing_. Thus, we name the node identifier as _Content Identifier_ or CID. | ||
|
||
For example, the previous linked list, assuming that the payload of eachnode is just the CID of its descendant would be: _A=Hash(B)→B=Hash(C)→C=Hash(∅)_. The properties of the hash function ensure thatno cycles can exist when creating Merkle-DAGs[10]. | ||
|
||
Merkle-DAGs are _self-verified_ structures. The CID of a node is univocally linked to the contents of its payload and those of all its descendants. Thus two nodes with the same CID univocally represent exactly the same DAG. This will be a key property to efficiently sync Merkle-CRDTs without having to copy the full DAG, as exploited by systems like IPFS. Merkle-DAGs are very widely used. Source control systems like Git [11] and others [6] use them to efficiently store the repository history, in away that enables de-duplicating the objects and detecting conflicts between branches. | ||
|
||
_Excerpted from Markle-CRDT draft paper by @hsanjuan, @haadcode, and @pgte. Available: https://hector.link/presentations/merkle-crdts/merkle-crdts.pdf_ | ||
|
||
|
||
### Footnotes | ||
|
||
[6] Merkle-DAGs are similar to Merkle Trees [20] but there are no balance requirements and every node can carry a payload. In DAGs, several branches can re-converge or, in other words, a node can have several parents. | ||
|
||
[10] Hash functions are one way functions. Creating a cycle should then be impossibly difficult, unless some weakness is discovered and exploited. |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,75 @@ | ||
--- | ||
title: How IPFS Works | ||
weight: 2 | ||
--- | ||
|
||
IPFS is a peer-to-peer (p2p) storage network. Content is accessible through peers that might relay information or store it (or do both), and those peers can be located anywhere in the world. IPFS knows how to find what you ask for by its content address, rather than where it is. | ||
|
||
## There are three important things to understand about IPFS | ||
|
||
Let’s first look at _content addressing_ and how that content is _linked together_. This “middle” part of the IPFS stack is what connects the ecosystem together; everything is built on being able to find content via linked, unique identifiers. | ||
|
||
### 1 \\ Content addressing and linked data | ||
|
||
IPFS uses _content addressing_ to identify content by what’s in it, rather than by where it’s located. Looking for an item by content is something you already do all the time. For example, when you look for a book in the library, you often ask for it by the title; that’s content addressing because you’re asking for **what** it is. If you were using location addressing to find that book, you’d ask for it by **where** it is: “I want the book that’s on the second floor, first stack, third shelf from the bottom, four books from the left.” If someone moved that book, you’d be out of luck! | ||
|
||
It’s the same on the internet and on your computer. Right now, content is found by location, such as… | ||
|
||
- `https://en.wikipedia.org/wiki/Aardvark` | ||
- `/Users/Alice/Documents/term_paper.doc` | ||
- `C:\Users\Joe\My Documents\project_sprint_presentation.ppt` | ||
|
||
By contast, every piece of content that uses the IPFS protocol has a [*content identifier*]({{<relref "guides/concepts/cid.md">}}), or CID, that is its *hash*. The hash is unique to the content that it came from, even though it may look short compared to the original content. _If hashes are new to you, check out [the concept guide on hashes]({{<relref "guides/concepts/hashes.md">}}) for an introduction._ | ||
|
||
Content addressing through hashes has become a widely-used means of connecting data in distributed systems, from the commits that back your code to the blockchains that run cryptocurrencies. However, the underlying data structures in these systems are not necessarily interoperable. | ||
|
||
This is where the [IPLD project](https://ipld.io/) comes in. **Hashes identify content, and IPLD translates between data structures**. Since different distributed systems structure their data in different ways, IPLD provides libraries for combining pluggable modules (parsers for each possible type of IPLD node) to resolve a path, selector, or query across many linked nodes (allowing you explore data regardless of the underlying protocol). IPLD provides a way to translate between content-addressable data structures: “Oh you use Git-style, no worries, I can follow those links. Oh you use Ethereum, I got you, I can follow those links too!” | ||
|
||
The IPFS protocol uses IPLD to get from raw content to an IPFS address. IPFS has its own preferences and conventions about how data should be broken up into a DAG (more on DAGs below!); IPLD links content on the IPFS network together using those conventions. | ||
|
||
**Everything else in the IPFS ecosystem builds on top of this core concept: linked, addressable content is the fundamental connecting element that makes the rest work.** | ||
|
||
### 2 \\ IPFS turns files into DAGs | ||
|
||
IPFS and many other distributed systems take advantage of a data structure called [directed acyclic graphs](https://en.wikipedia.org/wiki/Directed_acyclic_graph), or DAGs. Specifically, they use _Merkle-DAGs_, which are DAGs where each node has an identifier that is a hash of the node’s contents. Sound familiar? This refers back to the _CID_ concept that we covered in the previous section. Another way to look the this CID-linked-data concept: identifying a data object (like a Merkle-DAG node) by the value of its hash is _content addressing_. _(Check out [the concept guide on Merkle-DAGs]({{<relref "guides/concepts/merkle-DAG.md">}}) for a more in-depth treatment of this topic.)_ | ||
jessicaschilling marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
IPFS uses a Merkle-DAG that is optimized for representing directories and files, but you can structure a Merkle-DAG in many different ways. For example, Git uses a Merkle-DAG that has many versions of your repo inside of it. | ||
|
||
To build a Merkle-DAG representation of your content, IPFS often first splits it into _blocks_. Splitting it into blocks means that different parts of the file can come from different sources, and be authenticated quickly. (If you've ever used BitTorrent, you may have noticed that when you download a file, BitTorrent can fetch it from multiple peers at once; this is the same idea.) | ||
|
||
Merkle-DAGs are a bit of a [“turtles all the way down”](https://ipfs.io/ipfs/QmXoypizjW3WknFiJnKLwHCnL72vedxjQkDDP1mXWo6uco/wiki/Turtles_all_the_way_down.html) scenario; that is, **everything** has a CID. You’ve got a file that has a CID. What if there are several files in a folder? That folder has a CID, and that CID contains the CIDs of the files underneath. In turn, those files are made up of blocks, and each of those blocks has a CID. You can see how a file system on your computer could be represented as a DAG. You can also see, hopefully, how Merkle-DAG graphs start to form. For a visual exploration of this concept, take a look at our [IPLD Explorer](https://explore.ipld.io/#/explore/QmSnuWmxptJZdLJpKRarxBMS2Ju2oANVrgbr2xWbie9b2D). | ||
|
||
Another useful feature of Merkle-DAGs and breaking content into blocks is that if you have two similar files, they can share parts of the Merkle-DAG; ie, parts of different Merkle-DAGs can reference the same data. For example, if you update a website, only the files that changed will get new content addresses. Your old version and your new version can refer to the same blocks for everything else. This can make transferring versions of large datasets (such as genomics research or weather data) more efficient because you only need to transfer the parts that are new or have changed instead of creating entirely new files each time. | ||
|
||
|
||
### 3 \\ The DHT | ||
|
||
So, to recap, IPFS lets you give CIDs to content, and link that content together in a Merkle-DAG using IPLD. Now let’s move on to the last piece: how you find and move content. | ||
|
||
To find which peers are hosting the content you’re after (_discovery_), IPFS uses a [_distributed hash table_](https://en.wikipedia.org/wiki/Distributed_hash_table), or DHT. A hash table is a database of keys to values. A _distributed_ hash table is one where the table is split across all the peers in a distributed network. To find content, you ask these peers. | ||
|
||
The <a hrefm src="https://libp2p.io/">libp2p project</a> is the part of the IPFS ecosystem that provides the DHT and handles peers connecting and talking to each other. (Note that, as with IPLD, libp2p can also be used as a tool for other distributed systems, not just IPFS.) | ||
|
||
Once you know where your content is (ie, which peer or peers are storing each of the blocks that make up the content you’re after), you use the DHT **again** to find the current location of those peers (_routing_). So, in order to get to content, you use libp2p to query the DHT twice. | ||
|
||
You’ve discovered your content, and you’ve found the current location(s) of that content — now you need to connect to that content and get it (_exchange_). To request blocks from and send blocks to other peers, IPFS currently uses a module called [_Bitswap_](https://github.com/ipfs/specs/tree/master/bitswap). Bitswap allows you to connect to the peer or peers that have the content you want, send them your _wantlist_ (a list of all the blocks you're interested in), and have them send you the blocks you requested. Once those blocks arrive, you can verify them by hashing their content to get CIDs. (These CIDs also allow you to deduplicate blocks if needed.) | ||
|
||
There are [other content replication protocols under discussion](https://github.com/ipfs/camp/blob/master/DEEP_DIVES/24-replication-protocol.md) as well, the most developed of which is [_Graphsync_](https://github.com/ipld/specs/blob/master/block-layer/graphsync/graphsync.md). There's also a proposal under discussion to [extend the Bitswap protocol](https://github.com/ipfs/go-bitswap/issues/186) to add functionality around requests and responses. | ||
|
||
#### A note on libp2p | ||
|
||
What makes libp2p especially useful for peer to peer connections is _connection multiplexing_. Traditionally, every service in a system would open a different connection to remotely communicate with other services of the same kind. Using IPFS, you open just one connection, and you multiplex everything on that. For everything your peers need to talk to each other about, you send a little bit of each thing, and the other end knows how to sort those chunks where they belong. | ||
|
||
This is useful because establishing connections is usually hard to set up and expensive to maintain. With multiplexing, once you have that connection, you can do whatever you need on it. | ||
|
||
|
||
## And everything is modular | ||
|
||
As you may have noticed from this discussion, the IPFS ecosystem is made up of many modular libraries that support specific parts of any distributed system. You can certainly use any part of the stack independently, or combine them in novel ways. | ||
|
||
|
||
## Summary | ||
|
||
The IPFS ecosystem gives CIDs to content, and links that content together by generating IPLD-Merkle-DAGs. You can discover content using a DHT that's provided by libp2p, and open a connection to any provider of that content and download it using a multiplexed connection. All of this is held together by the “middle” of the stack, which is linked, unique identifiers; that's the essential part that the IPFS is built on. | ||
|
||
<!--Next, we’ll look at how IPFS is an interconnected network of equal peers, each with the same abilities (no client-server relationships), and what that means for system architectures. We’ll also touch on another useful project in the ecosystem -- IPFS Cluster -- that can help make sure your content is always available, even on a network like IPFS that supports peers dropping in and out at will.--> |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.