Skip to content

feat: add normalise input function #5

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Sep 4, 2019
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
"license": "MIT",
"dependencies": {
"buffer": "^5.2.1",
"err-code": "^2.0.0",
"is-buffer": "^2.0.3",
"is-electron": "^2.2.0",
"is-pull-stream": "0.0.0",
Expand All @@ -36,9 +37,10 @@
},
"devDependencies": {
"aegir": "^20.0.0",
"async-iterator-all": "^1.0.0",
"chai": "^4.2.0",
"dirty-chai": "^2.0.1",
"electron": "^5.0.7",
"electron": "^6.0.6",
"electron-mocha": "^8.0.3",
"pull-stream": "^3.6.13"
},
Expand Down
233 changes: 233 additions & 0 deletions src/files/normalise-input.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,233 @@
'use strict'

const errCode = require('err-code')
const { Buffer } = require('buffer')

/*
* Transform one of:
*
* ```
* Buffer|ArrayBuffer|TypedArray
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the sake of simplicity and understanding maybe we should setup an alias for this:

Suggested change
* Buffer|ArrayBuffer|TypedArray
* Bytes (Buffer|ArrayBuffer|TypedArray|Iterable<Number>)
* Bloby (Blob|File)

When I see Bytes I think that they are part of a set of bytes for a single file (albeit there may only be one), when I see Bloby I think that this is a whole file.

* Blob|File
* { path, content: Blob }
* { path, content: String }
* { path, content: Iterable<Number> }
* { path, content: Iterable<Buffer> }
* { path, content: Iterable<Iterable<Number>> }
* { path, content: AsyncIterable<Iterable<Number>> }
* String
* Iterable<Number>
* Iterable<Buffer>
* Iterable<Blob>
* Iterable<{ path, content: Buffer }>
* Iterable<{ path, content: Blob }>
* Iterable<{ path, content: Iterable<Number> }>
* Iterable<{ path, content: AsyncIterable<Buffer> }>
* AsyncIterable<Buffer>
* AsyncIterable<{ path, content: Buffer }>
* AsyncIterable<{ path, content: Blob }>
* AsyncIterable<{ path, content: Iterable<Buffer> }>
* AsyncIterable<{ path, content: AsyncIterable<Buffer> }>
* ```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No support for pull streams?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping to not include it here. My thinking was that all ipfs.add* methods could route to ipfs._addAsyncIterator, and it would be down to those methods to convert their arguments to something this method accepts. E.g. ipfs.addPullStream would do the pull-stream source to async iterator conversion, ipfs.addFromFs would convert glob-matched paths to async iterators of file objects with paths & content readable streams, etc.

I'm working on integrating that with ipfs and ipfs-http-client so we'll see how realistic that is..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds nice, but you can call ipfs.add({ content: PullStreamSource }) so I think it's necessary, unless we want to duplicate a lot of the normalise work in both places.

* Into:
*
* ```
* AsyncIterable<{ path, content: AsyncIterable<Buffer> }>
* ```
*
* @param input Object
* @return AsyncInterable<{ path, content: AsyncIterable<Buffer> }>
*/
module.exports = function normaliseInput (input) {
// must give us something
if (input === null || input === undefined) {
throw errCode(new Error(`Unexpected input: ${input}`, 'ERR_UNEXPECTED_INPUT'))
}

// { path, content: ? }
if (isFileObject(input)) {
return (async function * () { // eslint-disable-line require-await
yield toFileObject(input)
})()
}

// String
if (typeof input === 'string' || input instanceof String) {
return (async function * () { // eslint-disable-line require-await
yield toFileObject(input)
})()
}

// Buffer|ArrayBuffer|TypedArray
// Blob|File
if (isBytes(input) || isBloby(input)) {
return (async function * () { // eslint-disable-line require-await
yield toFileObject(input)
})()
}

// Iterable<?>
if (input[Symbol.iterator]) {
// Iterable<Number>
if (!isNaN(input[0])) {
return (async function * () { // eslint-disable-line require-await
yield toFileObject([input])
})()
}

// Iterable<Buffer>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's werid that Iterable<Buffer> means multiple files but AsyncIterable<Buffer> is a single file...

As I mentioned in an earlier comment I reckon we should interpret sets of Bytes as a single file and sets of Blobys as multiple files.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - multiple Buffers should be treated as one file, multiple Blobs are multiple files. That's the intention anyway, I guess the comment isn't clear.

Realistically you'd need { path, content } objects for multiple Blobs as they have no path themselves and the importer will fail complaining of multiple roots.

Alternatively we could widen it to accept HTML File objects which have a .name property and use that as the path? The user would still need to pass the wrapWithDirectory arg to the importer though because I think the .name property is supposed to just be the filename, not the path to the file within a directory structure as with the { path, content } object.

Urgh, what fun.

// Iterable<Blob>
return (async function * () { // eslint-disable-line require-await
for (const chunk of input) {
yield toFileObject(chunk)
}
})()
}

// AsyncIterable<?>
if (input[Symbol.asyncIterator]) {
return (async function * () { // eslint-disable-line require-await
for await (const chunk of input) {
yield toFileObject(chunk)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does not support AsyncIterable<Bytes>? I think fs.createReadStream works because the stream it creates has a path property and it is assumed to be a file object earlier in the code, but not all AsyncIterable<Bytes>'s will necessarily have this property.

}
})()
}

throw errCode(new Error('Unexpected input: ' + typeof input), 'ERR_UNEXPECTED_INPUT')
}

function toFileObject (input) {
return {
path: input.path || '',
content: toAsyncIterable(input.content || input)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both path and content are optional - i.e. you can pass { path: '/path/to/dir' } so this won't work in this case.

}
}

function toAsyncIterable (input) {
// Buffer|ArrayBuffer|TypedArray|array of bytes
if (isBytes(input)) {
return (async function * () { // eslint-disable-line require-await
yield toBuffer(input)
})()
}

if (typeof input === 'string' || input instanceof String) {
return (async function * () { // eslint-disable-line require-await
yield toBuffer(input)
})()
}

// Blob|File
if (isBloby(input)) {
return blobToAsyncGenerator(input)
}

// Iterator<?>
if (input[Symbol.iterator]) {
if (!isNaN(input[0])) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is the right check here? When would you pass an [NaN]?

Suggested change
if (!isNaN(input[0])) {
if (Number.isInteger(input[0])) {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've just determined this object is iterable, but that does not mean it's an array and you can access values like input[0]. Strictly speaking you should input[Symbol.iterator]().next().value

Copy link
Member Author

@achingbrain achingbrain Sep 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not is not a number means it's a number 😉 . Of course it'll happily accept floats and negative numbers, etc which would lead to 🤪. Number.isInteger is probably better in this case..

I'd hoped to avoid calling .next() on the iterator in case it consumes a value that otherwise wouldn't be re-iterated - e.g. more of a .peek()-type mechanism. Could probably wrap the iterator in another iterator(!) that does something a bit more sensible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it-peek!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not is not a number means it's a number 😉 . Of course it'll happily accept floats and negative numbers, etc which would lead to 🤪.

😉 ...and other values are are not NaN. isNaN is just the worst. For the benefit of anyone else reading this the following non-numbers would also make it through!:

!isNaN(null) // true
!isNaN('123') // true
!isNaN(' ') // true
!isNaN([]) // true
!isNaN(new Date) // true

return (async function * () { // eslint-disable-line require-await
yield toBuffer(input)
})()
}

return (async function * () { // eslint-disable-line require-await
for (const chunk of input) {
yield toBuffer(chunk)
}
})()
}

// AsyncIterable<?>
if (input[Symbol.asyncIterator]) {
return (async function * () {
for await (const chunk of input) {
yield toBuffer(chunk)
}
})()
}

throw errCode(new Error(`Unexpected input: ${input}`, 'ERR_UNEXPECTED_INPUT'))
}

function toBuffer (chunk) {
if (isBytes(chunk)) {
return chunk
}

if (typeof chunk === 'string' || chunk instanceof String) {
return Buffer.from(chunk)
}

if (Array.isArray(chunk)) {
return Buffer.from(chunk)
}

throw new Error('Unexpected input: ' + typeof chunk)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the test for isBytes(chunk) you could just remove the other tests and return Buffer.from(chunk) since it'll already throw if it doesn't get one of the things you're checking for.


function isBytes (obj) {
return Buffer.isBuffer(obj) || ArrayBuffer.isView(obj) || obj instanceof ArrayBuffer
}

function isBloby (obj) {
return typeof Blob !== 'undefined' && obj instanceof global.Blob
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return typeof Blob !== 'undefined' && obj instanceof global.Blob
return typeof globalThis.Blob !== 'undefined' && obj instanceof globalThis.Blob

and require https://github.com/ipfs/js-ipfs-utils/blob/master/src/globalthis.js at the top

}

// An object with a path or content property
function isFileObject (obj) {
return typeof obj === 'object' && (obj.path || obj.content)
}

function blobToAsyncGenerator (blob) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extract as a module blob-to-it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or just mkdir iterables && touch blob-to-it.js in this repo for less overhead

if (typeof blob.stream === 'function') {
// firefox < 69 does not support blob.stream()
return streamBlob(blob)
}

return readBlob(blob)
}

async function * streamBlob (blob) {
const reader = blob.stream().getReader()

while (true) {
const result = await reader.read()

if (result.done) {
return
}

yield result.value
}
}

async function * readBlob (blob, options) {
options = options || {}

const reader = new global.FileReader()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const reader = new global.FileReader()
const reader = new globalThis.FileReader()

and require https://github.com/ipfs/js-ipfs-utils/blob/master/src/globalthis.js at the top

const chunkSize = options.chunkSize || 1024 * 1024
let offset = options.offset || 0

const getNextChunk = () => new Promise((resolve, reject) => {
reader.onloadend = e => {
const data = e.target.result
resolve(data.byteLength === 0 ? null : data)
}
reader.onerror = reject

const end = offset + chunkSize
const slice = blob.slice(offset, end)
reader.readAsArrayBuffer(slice)
offset = end
})

while (true) {
const data = await getNextChunk()

if (data == null) {
return
}

yield Buffer.from(data)
}
}
138 changes: 138 additions & 0 deletions test/files/normalise-input.spec.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
'use strict'

/* eslint-env mocha */
const chai = require('chai')
const dirtyChai = require('dirty-chai')
const normalise = require('../../src/files/normalise-input')
const { supportsFileReader } = require('../../src/supports')
const { Buffer } = require('buffer')
const all = require('async-iterator-all')

chai.use(dirtyChai)
const expect = chai.expect

const STRING = 'hello world'
const BUFFER = Buffer.from(STRING)
const ARRAY = Array.from(BUFFER)
let BLOB

if (supportsFileReader) {
BLOB = new global.Blob([
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
BLOB = new global.Blob([
BLOB = new globalThis.Blob([

and require https://github.com/ipfs/js-ipfs-utils/blob/master/src/globalthis.js at the top

STRING
])
}

async function verifyNormalisation (input) {
expect(input.length).to.equal(1)

if (!input[0].content[Symbol.asyncIterator] && !input[0].content[Symbol.iterator]) {
chai.assert.fail(`Content should have been an iterable or an async iterable`)
}

expect(await all(input[0].content)).to.deep.equal([BUFFER])
expect(input[0].path).to.equal('')
}

async function testContent (input) {
const result = await all(normalise(input))

await verifyNormalisation(result)
}

function iterableOf (thing) {
return [thing]
}

function asyncIterableOf (thing) {
return (async function * () { // eslint-disable-line require-await
yield thing
}())
}

describe('normalise-input', function () {
function testInputType (content, name) {
it(name, async function () {
await testContent(content)
})

it(`Iterable<${name}>`, async function () {
await testContent(iterableOf(content))
})

it(`AsyncIterable<${name}>`, async function () {
await testContent(asyncIterableOf(content))
})

if (name !== 'Blob') {
it(`AsyncIterable<Iterable<${name}>>`, async function () {
await testContent(asyncIterableOf(iterableOf(content)))
})

it(`AsyncIterable<AsyncIterable<${name}>>`, async function () {
await testContent(asyncIterableOf(asyncIterableOf(content)))
})
}

it(`{ path: '', content: ${name} }`, async function () {
await testContent({ path: '', content })
})

if (name !== 'Blob') {
it(`{ path: '', content: Iterable<${name}> }`, async function () {
await testContent({ path: '', content: iterableOf(content) })
})

it(`{ path: '', content: AsyncIterable<${name}> }`, async function () {
await testContent({ path: '', content: asyncIterableOf(content) })
})
}

it(`Iterable<{ path: '', content: ${name} }`, async function () {
await testContent(iterableOf({ path: '', content }))
})

if (name !== 'Blob') {
it(`Iterable<{ path: '', content: Iterable<${name}> }>`, async function () {
await testContent(iterableOf({ path: '', content: iterableOf(content) }))
})

it(`Iterable<{ path: '', content: AsyncIterable<${name}> }>`, async function () {
await testContent(iterableOf({ path: '', content: asyncIterableOf(content) }))
})
}

it(`AsyncIterable<{ path: '', content: ${name} }`, async function () {
await testContent(asyncIterableOf({ path: '', content }))
})

if (name !== 'Blob') {
it(`AsyncIterable<{ path: '', content: Iterable<${name}> }>`, async function () {
await testContent(asyncIterableOf({ path: '', content: iterableOf(content) }))
})

it(`AsyncIterable<{ path: '', content: AsyncIterable<${name}> }>`, async function () {
await testContent(asyncIterableOf({ path: '', content: asyncIterableOf(content) }))
})
}
}

describe('String', () => {
testInputType(STRING, 'String')
})

describe('Buffer', () => {
testInputType(BUFFER, 'Buffer')
})

describe('Blob', () => {
if (!supportsFileReader) {
return
}

testInputType(BLOB, 'Blob')
})

describe('Iterable<Number>', () => {
testInputType(ARRAY, 'Iterable<Number>')
})
})