Skip to content

feat: Non MFS Files API tutorial #303

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 67 commits into from
Nov 6, 2019
Merged

feat: Non MFS Files API tutorial #303

merged 67 commits into from
Nov 6, 2019

Conversation

dominguesgm
Copy link
Contributor

Introducing a non-MFS Files API Tutorial. Currently consisting of 6 lessons, 1 simple and 5 code lessons.

Structure:

  1. Introducing the File API
  2. Adding a file
  3. Read the contents of a file
  4. Add files in a folder
  5. Listing the files in a directory
  6. Getting all the files in a directory tree

Contents should still be proof-read and validated before merging.

Closes #203

@dominguesgm
Copy link
Contributor Author

Review made by @ericronne is in #304. I'll be checking it out today

@terichadbourne terichadbourne added the docs-ipfs In scope for IPFS Docs Working Group label Oct 8, 2019
dominguesgm and others added 8 commits October 11, 2019 10:57
@terichadbourne
Copy link
Member

This is looking good @dominguesgm!

Some outstanding items include:
[ ] update the featured tutorials so this shows on the homepage
[ ] add the resources page contents

Before I do any text edits, I want to ask some bigger conceptual questions and share areas where I got confused on my first pass through this so you can update the content accordingly. Note that I have only run through this as an end user and have not yet looked at your validation code.

Lesson 1:

I suspect this section may need a little tweaking if my understanding is correct:

If you've gone through our Decentralized Data Structures tutorial — or even Blogging on the Decentralized Web — you already know you can store primitives, objects and arrays in the network.

Storing these types of data is an interesting, but limited, use case. What if you want to share a picture of a kitten? How would you upload it to the network and provide a way for your friends to see it? What about a larger file, such as a funny video? How should the file be placed in the Directed Acyclic Graph (DAG) — in a single block or split into chunks?

You can think of the File API as an abstraction layer above the DAG API. The File API prepares files to be placed in the network, and ensures that IPFS knows how to access them. The details of what this API actually does will be covered later in this tutorial.

My understanding has been that DAG isn't a more limited use case, it's a broader use case because it can deal with all different kinds of data (including the obscure ones you mention), but that it's just really obnoxious to use for the most common use cases, like just sharing files. You could use the DAG API to share files, but you'd have to do a ton of extra work yourself to manipulate the file objects in the way you'd need to. The File API uses the DAG API under the hood but has lots of extra functionality built in to make it work with files more conveniently. We could have done this on our own, but the File API saves us time. (@mikeal can you please confirm this paragraph is true? )

Lesson 2:

  • Working with files in ProtoSchool: In shortening the information about how file upload works in ProtoSchool, you've left out the very important detail that we make the files you upload available to you in the files array in the code editor, and you're not introducing what information the browser has available to it (file.name, etc.). This gets called out in passing in lesson 4 but I think we need that referenced earlier, before anyone sees const run = async (files) => { at the top of every lesson. I feel like it might actually be worth repeating lesson 3 of MFS here (which displays to the user what details are made available), or at least pulling more information from it than you have, but please let me know what you think. If we repeated it then it should be a different lesson from the one teaching add and the lesson numbers would change.
  • The add method: In your "Inspect results" section, I'd love clarification that this is the value returned by the add command, and a reminder that the object returned by IPFS for each file contains a "hash" property that contains its CID, which will come in handy in our next lesson. (I think the hash is the CID based on your note earlier in the lesson but I see that path and hash are identical here so it could be path as well. We should explain what makes them different. ¯_(ツ)_/¯
  • In both lessons 2 and 4 you mention that you can pass either a single file object or an array of file objects. Let's include examples of the two formats with values filled in (kitten.jpg, not file.name). Documentation sometimes suffers from not giving clear examples of how to use things, so including a humanized example of "if I had a photo saved as 'kitty.jpg', I could add it like this:" is very helpful. And then it could be done again with an example of an array of dog and cat photos, etc. You can find some examples of this format here: http://localhost:3000/#/mutable-file-system/04

Lesson 3:

  • As noted above we need a reminder of how we find out the CID of the file, not just how we use it, so you could reference the results from lesson 2 near the start of this one. Something like "As we saw in the previous lesson, the add method returns an object for each file that includes a hash property containing it's CID, blah blah"
  • My first attempt at this lesson made the validation hang because I passed in the CID without putting quotation marks around it, so it would have tried to read it as a variable and not a string. I'd recommend both adding a hint about this and seeing if there's any way for you to catch that specific error in validation. The documentation technique described above would also help, so I had an example where I saw an actual CID in quote marks.
  • In general I think the concept of overwriting values is harder than the concept of creating a new variable to hold something you've messed around with, so I'd avoid something like this:
let message = await ipfs.cat('QmWCscor6qWPdx53zEQmZvQvuWQYxx1ARRCXwYVE4s9wzJ')
message = message.toString('utf8')
  • What are your thoughts on whether we should remove the file upload component here, since it's not actually being touched? Normally it's handy to have the files still in your browser even when they're not needed for this one, but I think it adds unnecessary complexity to the code challenge. I also had to redo my file uploads for lesson 4 anyway because i wanted to see what it was like to add multiple files to a directory and i'd only done one previously. Let me know what you think.

Lesson 4:

  • Feedback from previous reviewers of other tutorials was that we should consistently say directory rather than folder.
  • The first time I read the documentation for the add method I took path to mean a path on my computer where I was getting the file from, not a destination path I was inventing, so that may be worth calling out.
  • Conceptually I think we need to spend some time explaining what directories are in IPFS. I believe that a directory's hash is going to change each time we change its contents, which is important to highlight. Also, how are we able to name directories? Is this some kind of weird abstraction like MFS is? We just told everyone that the regular Files API can't act like a file system, so what's the catch? Is it that we'll never be able to find it again by its name and will always have to find it by its CID, which keeps changing? (We may need to call in an expert to help with this question, maybe you can chat quickly with Hugo?)
  • You say "Don't forget, you can get the name of the file you uploaded with file.name." but you've never told them that's possible, as noted earlier.
  • Since not everyone who uses the site is a JavaScript pro, they may be more familiar with forEach than with Array methods like map or filter. In some of the other tutorials we've provided solutions that showed both a more advanced and an easier option, with one of them commented out. As an example, this is what I tried on my own without looking at your solution:
/* global ipfs */
const run = async (files) => {

  let fileObjects = []
  files.forEach(function (file) {
    let fileObject = {
      path: "/dir/" + file.name,
      content: file
    }
    fileObjects.push(fileObject)
  })

  let result = await ipfs.add(fileObjects, {wrapWithDirectory: true})

  return result
}
return run
  • I think that what you're listing under "Inspect results" is the value returned by the add function, which should be phrased a bit more clearly (as opposed to this being you showing us an ls, for example. When I did this I added two files and got 4 objects in the returned array, which needs some further explanation, I think probably two were for the files I added, one was for the path "dir" so presumably the directory and one was for path "" so maybe the root node? ¯_(ツ)_/¯

Lesson 5:

  • I don't think it's clear what you're doing to find the pathCID, in part because it's a prerequisite here that I be familiar with the find method, which isn't anything I use regularly. Are you searching through the results from add to find one where the path is the name of the directory, and then getting the hash value from that same object? Do you want to explain this or will users need to learn to do it enough that it should be its own lesson, or part of the challenge in this one? Again, we probably need some context to connect the results of the last lesson to the start of this one.
  • Your solution code doesn't pass this lesson, which would make it fail Cypress testing if Cypress knew how to test file upload lessons. :) Looks like you forgot to include pathCID and just used await ipfs.ls(). The error message from your validation is: "The CID provided to ipfs.ls is incorrect. Make sure you're using the pathCID variable we provided." I assume that doing await ipfs.ls() will either flat out fail or show the root directory contents, either of which is probably worth calling out as a separate error from using a CID that's wrong in some other way.
  • Again I'd specify how you're getting the "inspect results" display, so perhaps changing "Here are the contents of the dir directory." to "Here are the contents of the dir directory as returned by the ls method."

Lesson 6:

  • I find it pretty hard to understand what you're asking the learner to return in this coding challenge, which combines a couple of different concepts.
  • I'm also struggling to come up with the right method to complete this task without using map, which I consider more advanced. For previous lessons we've added links to JavaScript documentation when we're inadvertently testing JavaScript skills instead of IPFS skills.
  • Again, I don't think we should expect users to overwrite variable names.
  • If we're going to use this format then I think we need to cut out the file upload here because the CID they're getting from and the files being returned have nothing to do with those they've uploaded. Another alternative would be to have them add one new file from a CID (could be the success.txt one) to the directory they already created. This would have a bonus (I think) of showing how the CID of the directory changes when its contents change. Obviously we can't include the toString('utf-8') conversion if we do that, since we wouldn't know what kind of files they had. Tell me more about why you asked them to do that.... because it's a skill they'll need (although the did it earlier in another context) or because displaying the results was unwieldy when the content filed wasn't a string?
  • Again we could make the phrasing under "Inspect results" a little more specific.

In going back to the original issue and outline for this tutorial, the only suggestion I don't see covered here is to show people how to use the CID to view a file on the gateway. I believe neither the files they're adding themselves nor the ones you've used as examples are living there, so maybe that's not practical anymore? Thoughts?

I remember hearing about some MFS pitfalls in the end section of Alan's course at IPFS camp that highlighted important distinctions between MFS and all the rest. Would you mind taking a quick look to see whether there are any bits here worth including when you describe the difference between the regular File API and MFS?

Sorry for the novel here, @dominguesgm. I'll be out on Monday so just want to share as much feedback as I can so you can get started on anything that doesn't require more clarification. Happy to schedule some time to chat through any of this feedback if you disagree or if I haven't been clear.

Thank you so much for building this tutorial. It's fabulous!

@dominguesgm
Copy link
Contributor Author

Thank you for the in depth feedback @terichadbourne , I have some points I'd love to clarify/discuss, which I will structure by lesson:

** Lesson 1: **

  • I agree with your point, I'll probably rephrase the way I'm going about comparing the DAG API and the Regular File API. I was under the impression you didn't think we should go into more complex details and that's why I avoided discussing things the Regular File API introduces, like file chunking, etc. (if I'm not mistaken). Maybe it would be relevant to mention these advantages briefly

** Lesson 2: **

  • I've removed the section on file upload and copied the lesson 3 from the MFS tutorial to this tutorial (right before this lesson). I'll probably rework its content because I don't recall it talking about the file properties in the lesson itself.

  • I've changed the success message to tell the user the output in the Inspect Results corresponds to the value returned by the add command.

  • I've also added an example with a friendlier variable name and a more concrete scenario as suggested, right after presenting the add method

** Lesson 3: **

** Lesson 4: **

  • I replaced all instances of folder with directory, but missed that one 😅

  • I'll try to make the meaning of path clearer, and I agree we should talk to someone with a better understanding of the inner workings of the API about what directories actually are.

  • The file.name property will be mentioned in the new Lesson 2

  • I can change the maps usage and try to find a simpler way to approach this. I'll probably go for a simple for loop, I feel like forEach would be at practically the same level as map as they're both array functions, but I might be wrong. I'll take a look at the solutions you mentioned which showcase an advanced and an easier approach.

  • As in the other lesson, I'll work on a better success message, though maybe I'll write the explanation for the extra elements in the array resulting from the add method right before the code exercise.

** Lesson 5: **

  • My approach for this code lesson was to write the code needed to setup the exercise as concisely as possible in order to avoid having a very large boilerplate for the exercise. Maybe I can put a comment explaining what that block of code is doing? Basically I'm adding all the files into a dir directory and then finding the CID of the directory so the user can then ls it afterwards.

  • Yes, the issue was the ls method was being called with no argument. It will be fixed in the next commit.

  • I agree with this last point on a clearer success message, again. I'll go through the success messages of the coding exercises.

** Lesson 6: **

  • I understand your point, and I did spend some time debating on whether the exercise was becoming too complex or not.

  • About the overwriting of the variables, if I change to that approach, I would probably remove the return statement from the code boilerplate and replace it with a comment saying something along the lines of "don't forget to return the result". This because if we don't expect the variables to be overwritten, we can't know for sure which variable will contain the results the user has to return. Another option would be to have something like return // don't forget to place your result here, but I really dislike the idea of leaving an empty return in, it makes it seem like that part is already done.

  • This exercise is more complex because my approach was to teach how get would return the whole directory structure a CID points to. The problem with using user files here is that we don't control which files the user is uploading. For example, I always tested the exercises with jpgs ranging from 100kb to 500kb. The get command returns, among other data, a buffer of the file's contents. This, and because I wanted the user to be able to read the contents of the files in the directories, is why I pre-built a directory structure in the node for this exercise. I think we could remove the upload file functionality from this exercise, as well as the boilerplate code which is not used, to avoid confusing the user.

I'll be honest, the CID explorer went over my head, I can try to look into it. But from what I understand, we may have some issues letting a user upload a file and then asking them to view it on the CID explorer because the page may hang due to not finding the file at all. I'll still look into it.

Sorry for the long message, but hopefully this will help us get closer to a final version of the tutorial 😃

@terichadbourne
Copy link
Member

@dominguesgm Thanks for working on some edits! A few responses to some of your points below. Let me know if I've missed anything you want to be sure to discuss.

Lesson 1:

I agree with your point, I'll probably rephrase the way I'm going about comparing the DAG API and the Regular File API. I was under the impression you didn't think we should go into more complex details and that's why I avoided discussing things the Regular File API introduces, like file chunking, etc. (if I'm not mistaken). Maybe it would be relevant to mention these advantages briefly

Yeah, apologies if I led you astray with our earlier convo on this. I think it's okay to present some bonus advantages that the File API offers (briefly in passing) even if they won't be covered explicitly in this lesson, if they help to explain why you would choose this over the DAG API. For example, (if it's true 😂) you could mention that the Files API handles the complexity of splitting your file into chunks of appropriate size and provide a link to whatever the best documentation is that we know of about how that works. And then if, for example, there's an option you can send to the add method that tells it to use something other than the default chunk size, you could mention that in passing in the add lesson with a link to the documentation that shows you how to do it. (Note that I'm just inventing hypothetical examples here; I don't know how chunking actually works.)

Lesson 2:

I've removed the section on file upload and copied the lesson 3 from the MFS tutorial to this tutorial (right before this lesson). I'll probably rework its content because I don't recall it talking about the file properties in the lesson itself.

MFS lesson 3 talks about the files array in the content of the lesson and then talks about the properties that browser file objects contain in the part that appears after you submit a correct result, so that you see an example from the file you just uploaded. Feel free to propose another approach if you prefer.

Lesson 3:

We could do this, but I think we should discuss the pro's and con's: simplify interface and the code submission vs improving the continuity between lessons. #242 presents a similar suggestion for a lesson in the MFS tutorial
Yes, it's definitely a toss-up. I think one main difference between this and the other example is that we ended up integrating the files we put in with the files the user put in over the course of the MFS lesson and we don't really do it here? We can chat more about this one live.

Lesson 4:

I'll try to make the meaning of path clearer, and I agree we should talk to someone with a better understanding of the inner workings of the API about what directories actually are.

👍 I have a couple of folks in mind who we can ping for a review and improved explanations after your current batch of revisions.

I can change the maps usage and try to find a simpler way to approach this. I'll probably go for a simple for loop, I feel like forEach would be at practically the same level as map as they're both array functions, but I might be wrong. I'll take a look at the solutions you mentioned which showcase an advanced and an easier approach.

I personally am more comfortable with forEach than with any of the other, more specifically-purposed array methods, but I'm also happy to see a simpler solution with for. Let me know what you think works. We got another issue filed today where array methods were the blocker for someone who's not used to working in JavaScript, and where we planned out more potential variable names than everyone would want to use. We're always going to have the battle where folks who are strong in JavaScript will think we're being ridiculously verbose if we avoid those methods and beginners won't understand us if we use them, so I do think offering alternatives in the solution is reasonable for now.

It would be interesting to offer a toggle feature in the future where you could view solutions in "advanced" or "beginner" JavaScript mode.

Lesson 5:

My approach for this code lesson was to write the code needed to setup the exercise as concisely as possible in order to avoid having a very large boilerplate for the exercise. Maybe I can put a comment explaining what that block of code is doing? Basically I'm adding all the files into a dir directory and then finding the CID of the directory so the user can then ls it afterwards.
I do think that whether or not we make them practice it themselves, we need explicitly teach users how to figure out the CID of their directory, assuming that's something someone would do in real life without MFS. It could be narrative in the lesson and then a comment next to the more advanced JS in your code, potentially, if we're not surfacing it as a separate exercise, and I think it needs a callout when the results of the previous lesson are shown, as noted earlier.

Lesson 6:

About the overwriting of the variables, if I change to that approach, I would probably remove the return statement from the code boilerplate and replace it with a comment saying something along the lines of "don't forget to return the result". This because if we don't expect the variables to be overwritten, we can't know for sure which variable will contain the results the user has to return. Another option would be to have something like return // don't forget to place your result here, but I really dislike the idea of leaving an empty return in, it makes it seem like that part is already done.

Honestly, we're never going to know what people will want to do with their variables. My guess is that folks who are super new to JavaScript will most appreciate them being provided and well named, in which case they'd want the multiple variable names and distinct steps (without fancy overwriting array methods). Folks who are advanced will be able to ignore us as much as they want to and remove unnecessary steps, so if we had to favor one side of this argument I'd favor the total newbie. However, it might also be okay not to provide any blank variable statements, so people can use as many or as few steps as they want, and I don't mind return // don't forget to place your result here or leaving off the return and including a "don't forget to return your result warning." As long as we can catch "You forgot to return a result" in the validation, I'm flexible.

This exercise is more complex because my approach was to teach how get would return the whole directory structure a CID points to. The problem with using user files here is that we don't control which files the user is uploading. For example, I always tested the exercises with jpgs ranging from 100kb to 500kb. The get command returns, among other data, a buffer of the file's contents. This, and because I wanted the user to be able to read the contents of the files in the directories, is why I pre-built a directory structure in the node for this exercise. I think we could remove the upload file functionality from this exercise, as well as the boilerplate code which is not used, to avoid confusing the user.

If I'm understanding you correctly, it sounds like using get on an unknown file type (whatever the user uploads) without overwriting the results with non-Buffer will cause something totally unreadable and we won't have any known quantities to do validation against, and that's why you provided your own files. How bad would it be visually if you showed the buffer results of the .txt files without stringifying them first?

Option A (if it doesn't print a giant mess to the screen):

  • Just get the directory contents (provided by us) in one lesson, and see how useless the returned buffers are
  • In the following lesson, reveal the contents of a specific text file

Option B

  • Get a specific file (not a directory) and reveal its content (I think this is what was proposed in the original issue)

The thing I'm trying to avoid is the JavaScript trick of overwriting all the buffers with strings. I think I would find it an easier JavaScript challenge, for example, to find the content of the text file titled X. But perhaps if you give an example that uses simpler JavaScript methods I'll change my mind. :)

Agreed that if we'll only use the files you provided we should skip the file upload.

I'll be honest, the CID explorer went over my head, I can try to look into it. But from what I understand, we may have some issues letting a user upload a file and then asking them to view it on the CID explorer because the page may hang due to not finding the file at all. I'll still look into it.

I wasn't meaning to reference the CID explorer tool specifically, just the general concept that you can find things by CID on the gateway. I believe you just need to go to https://gateway.ipfs.io/ipfs/<your_CID_here>, but I think it does depend on the gateway being fully functional and enough people having pinned a thing. Let's ignore this suggestion for now.

@dominguesgm
Copy link
Contributor Author

dominguesgm commented Oct 17, 2019

@terichadbourne I made a commit with changes covering most of the topics we discussed. There are still a few I'd like to discuss more in depth:

  • a better description of what directories are
  • a way to simplify the last lesson

By the way, I really like the idea of having simple/difficult proposed solutions the user can see and toggle between the two. Maybe in another PR 😄

@terichadbourne
Copy link
Member

I opened an issue for the toggle idea, although that is definitely not a priority at the moment. Feel free to add any ideas there: #312

Look forward to chatting later today about those remaining issues.

@terichadbourne
Copy link
Member

@alanshaw A few specific content questions for you as @dominguesgm puts the final touches on this...

If one uses add and passes in an array of files with { wrapWithDirectory: true }, should they be able to later add more things to the same directory and have them all end up together, or is its existence basically all in our heads? We want want to make sure we're correctly describing what this concept of a directory really is within the non-MFS system. Any tips on how best to explain it in lesson 5, and perhaps contrast it to what happens in MFS for those who already did that tutorial? (Also, if you were going to be able to add more files to a directory, I presume we'd want to point out that its CID changes when you do that.)

We could also use a closer look at lesson 1 where Gil explains both the difference between the Files API and the DAG API and between the Files API and MFS. Want to make sure we're highlighting the right pros/cons/functionality here without getting too deep.

Feedback on all aspects of the tutorial are welcome, those are just our biggest newbie questions at the moment. :)

@dominguesgm
Copy link
Contributor Author

dominguesgm commented Oct 18, 2019

I wanted to put some more context into the question about adding multiple files into a directory with the add method.

I was experimenting with it and tried to add two different files into the same directory, with different file names, in separate add calls. However, if I called get with the CID to the common directory, it would only retrieve the directory and the last added file.

This is the code I was trying to run:

const IPFS = require('ipfs');

const ipfs = await IPFS.create();

let res = await ipfs.add({ content: Buffer.from('HelloWorld'), path: '/dir/file.txt'}, {wrapWithDirectory: true});

res = await ipfs.add({ content: Buffer.from('hello world 2'), path: '/dir/file2.txt'}, {wrapWithDirectory: true});

console.log(await ipfs.get(res[2].hash));

Which returns an array with the root, dir directory, and last added file:

[ { hash: 'QmPrjMFMsjwpgn5EVwY7k9Qrm5JJT47Q29TW5XxjTgtMZ9',
    path: 'QmPrjMFMsjwpgn5EVwY7k9Qrm5JJT47Q29TW5XxjTgtMZ9',
    name: 'QmPrjMFMsjwpgn5EVwY7k9Qrm5JJT47Q29TW5XxjTgtMZ9',
    depth: 1,
    size: 0,
    type: 'dir' },
  { hash: 'QmfXcmsEgF2PeevmbaoS1jGuNMBhYRfm468LW59qGdQGoA',
    path: 'QmPrjMFMsjwpgn5EVwY7k9Qrm5JJT47Q29TW5XxjTgtMZ9/dir',
    name: 'dir',
    depth: 2,
    size: 0,
    type: 'dir' },
  { hash: 'QmUGTcXwYTqHd5hbkRELDf6uSnBRjYv8CDEjsNDYde6o8h',
    path:
     'QmPrjMFMsjwpgn5EVwY7k9Qrm5JJT47Q29TW5XxjTgtMZ9/dir/file2.txt',
    name: 'file2.txt',
    depth: 3,
    size: 13,
    type: 'file',
    content: <Buffer 68 65 6c 6c 6f 20 77 6f 72 6c 64 20 32> } ]

I am aware that if I added multiple files in a single add call, as an array, it would work without any problem. I'm wondering if this approach should work as well.

@alanshaw
Copy link
Member

alanshaw commented Oct 18, 2019

If one uses add and passes in an array of files with { wrapWithDirectory: true }, should they be able to later add more things to the same directory and have them all end up together, or is its existence basically all in our heads?

The regular files API is non-mutable. The idea of wrapping with directory is to provide you with a single CID from which you can access your files by name. For example:

Given:

  • a.txt
  • b.zip
  • c.jpg

When you do:

ipfs.add([
  { path: 'a.txt', content: fs.createReadStream('./a.txt') },
  { path: 'b.zip', content: fs.createReadStream('./b.zip') },
  { path: 'c.jpg', content: fs.createReadStream('./c.jpg') }
])

You'll end up with 3 CIDs for each of the files which address each of the files: QmA, QmB and QmC.

When you do:

ipfs.add([
  { path: 'a.txt', content: fs.createReadStream('./a.txt') },
  { path: 'b.zip', content: fs.createReadStream('./b.zip') },
  { path: 'c.jpg', content: fs.createReadStream('./c.jpg') }
], { wrapWithDirectory: true })

You'll end up with 4 CIDs. The last one is the CID for the wrapping directory: QmA, QmB, QmC and QmWrappingDirectory.

When you don't wrap with a directory you can only use ipfs.cat('/ipfs/QmA') (for example) to read the contents of a.txt, you have to remember the hashes for each of the other files to be able to read them.

When you wrap with a directory it enables you to use ipfs.ls('/ipfs/QmWrappingDirectory') to get the names and CIDs for all 3 files meaning you only have one hash to remember. You're also able to ipfs.cat('/ipfs/QmWrappingDirectory/a.txt') (as well as ipfs.cat('/ipfs/QmA')) and can use ipfs.get('/ipfs/QmWrappingDirectory') to retrieve all of your files in one call.

So ensuring you have a wrapping directory for your files allows you to retain file names/paths, not have to remember every CID for every file you add and allows for easier fetching.

Note, you don't have to use the wrapWithDirectory option! If the paths you provide to add all begin with a common directory name, it is equivalent i.e.:

ipfs.add([
  { path: 'wrapper/a.txt', content: fs.createReadStream('./a.txt') },
  { path: 'wrapper/b.zip', content: fs.createReadStream('./b.zip') },
  { path: 'wrapper/c.jpg', content: fs.createReadStream('./c.jpg') }
])
// equivalent to:
ipfs.add([
  { path: 'a.txt', content: fs.createReadStream('./a.txt') },
  { path: 'b.zip', content: fs.createReadStream('./b.zip') },
  { path: 'c.jpg', content: fs.createReadStream('./c.jpg') }
], { wrapWithDirectory: true })

I believe js-ipfs currently doesn't allow you to add multiple root directories though, so don't do this:

ipfs.add([
  { path: 'documents/a.txt', content: fs.createReadStream('./a.txt') },
  { path: 'zips/b.zip', content: fs.createReadStream('./b.zip') },
  { path: 'pictures/c.jpg', content: fs.createReadStream('./c.jpg') }
])

Back to the question:

should they be able to later add more things to the same directory

You can re-add files and directories and IPFS will de-dupe anything you add twice. Your old content will still be accessible by the CIDs you recieved when you added it and likewise for the new content.

e.g. after two adds with duplicate content and an additional file:

    +------+
    | dir1 |
  +-+---+--+--+
  |     |     |
  |     |     |
+-v-+ +-v-+ +-v-+ +---+
| A | | B | | C | | D |
+-^-+ +-^-+ +-^-+ +-^-+
  |     |     |     |
  |     |     |     |
  +-+---+--+--+-----+
    | dir2 |
    +------+

You could unpin the CID for dir1 (when you ipfs.add things are pinned by default) and then run ipfs.repo.gc() and it'll be removed from your local store, but in the mean time bitswap may have shared it with another peer. So there's no real remove operation (someone else may have that content still).

Alternatively the object.patch API provides a low level way to "mutate" IPFS file structures but it isn't true mutation, it recreates DAG structures as needed and reuses existing where possible. It does this in a similar way to MFS https://youtu.be/Z5zNPwMDYGg?t=3263

You could also use the dag API which is even lower level but you'd need to also work directly with UnixFS to achieve the same results https://youtu.be/Z5zNPwMDYGg?t=2515

@alanshaw
Copy link
Member

const IPFS = require('ipfs');

const ipfs = await IPFS.create();

let res = await ipfs.add({ content: Buffer.from('HelloWorld'), path: '/dir/file.txt'}, {wrapWithDirectory: true});

res = await ipfs.add({ content: Buffer.from('hello world 2'), path: '/dir/file2.txt'}, {wrapWithDirectory: true});

console.log(await ipfs.get(res[2].hash));

Which returns an array with the root, dir directory, and last added file:

Exactly, you're essentially double-wrapping with this code: /wrapper/dir/*, but it allows you to access the content via /ipfs/QmWrappingDirectory/dir/file.txt if you so wish.

@dominguesgm
Copy link
Contributor Author

Thank you @alanshaw for such an in depth explanation of what functionality the wrapWithDirectory option is supposed to provide, I was not interpreting it correctly!

Copy link
Member

@hacdias hacdias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I enjoyed the lessons. It gives a good explanation about the non-mutable ipfs files API. Please let me know if you need something else from my side. I'll gladly help.

Copy link
Collaborator

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was fun. I really like "X vs Y" explanations in this tutorial, addressing common questions.

My only concern is around teaching people to use shortened notation of detached content paths without /ipfs/ prefix. Using direct CID is ok (it is an identifier on its own), but as soon as we start traversing the DAG and operating on paths, we should make sure those paths starts with explicit namespace.

(I don't feel super strong about it, but got feeling it makes things less vague when we start teaching IPNS at some poiint and introduce paths starting with /ipns/)

@terichadbourne
Copy link
Member

@dominguesgm The collapsing conversations make it kind of hard to keep track of, but I'm pretty sure I've gotten to all the suggestions you hadn't gotten to yet except for:

  • swapping map versus whatever else we had as the recommended solution, leaving the beginner-friendly one visible as an alternative solution
  • making a friendlier text file

I also went ahead and merged my PR and then pulled it into this PR to get API to be properly capitalized in the shortname.

@alanshaw @lidel if you have time Monday morning to take a quick look at today's commits, which include me addressing a number of issues you suggested from a text standpoint, that would be awesome.

@dominguesgm dominguesgm marked this pull request as ready for review November 5, 2019 17:15
@terichadbourne
Copy link
Member

@dominguesgm I've made some tweaks to validation messages. Please see the comments in lessons 5 and 7 in the places where I think the validation or messages associated with it still needs some tweaking. I also replaced the text of your catch-all message with something more action-oriented. This feedback is mostly from me reading your validation code and not from going through and trying to break it by doing things wrong that you didn't anticipate (which I'm generally great at but haven't had time for yet).

I also went through and changed all references to "root directory" to "top-level directory" per some earlier feedback from @alanshaw, and I moved your links to array methods to the hints section so people can see them before failing rather than after giving up and peeking at the solution. :)

@hacdias if you happen to have any time tomorrow morning to take a quick look at the validation code or try submitting some various predictably-wrong code and seeing if it catches on meaningful errors, that would be awesome.

Hoping to get this published by end of US day tomorrow if no one has lingering concerns.

Copy link
Member

@terichadbourne terichadbourne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🏆

@terichadbourne terichadbourne merged commit d7cd63d into code Nov 6, 2019
@terichadbourne terichadbourne deleted the feat/tutorial-nonmfs branch November 6, 2019 15:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs-ipfs In scope for IPFS Docs Working Group
Projects
None yet
Development

Successfully merging this pull request may close these issues.

New Tutorial: Working with files (non-MFS methods)
5 participants