Skip to content

General fixes #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,6 @@ node_modules
*.tmp.json
/example
/*.log

# Python related stuff needed to build native C libs
.venv
7 changes: 7 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
.PHONY: clean clean-all

clean:
@rm -rf node_modules

clean-all: clean
@rm -rf data
86 changes: 65 additions & 21 deletions api.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,8 @@ Get the URL for a checksum file of the specified version
<a name="module_bucket..fetchYearListings"></a>

### bucket~fetchYearListings() ⇒ <code>Promise.&lt;Array.&lt;{path:string, year:number}&gt;&gt;</code>
Fetch a set of years available on the Discogs data S3 bucket with theirpaths on the bucket.
Fetch a set of years available on the Discogs data S3 bucket with their
paths on the bucket.

**Kind**: inner method of [<code>bucket</code>](#module_bucket)
<a name="module_bucket..fetchFileListing"></a>
Expand All @@ -81,10 +82,12 @@ Fetch the list of files available on the S3 bucket for a certain year
<a name="module_bucket..parseFileNames"></a>

### bucket~parseFileNames(filenames) ⇒ <code>Object</code>
Parse a list of file paths (as returned by fetchFileListing). Groups themby year
Parse a list of file paths (as returned by fetchFileListing). Groups them
by year

**Kind**: inner method of [<code>bucket</code>](#module_bucket)
**Returns**: <code>Object</code> - An object with keys for each year and an array of parsedpath objects as values.
**Returns**: <code>Object</code> - An object with keys for each year and an array of parsed
path objects as values.

| Param | Type |
| --- | --- |
Expand Down Expand Up @@ -142,7 +145,8 @@ Get the path to where the checksum file for a specified version is stored
Looks up an existing data xml on disk

**Kind**: inner method of [<code>dataManager</code>](#module_dataManager)
**Returns**: <code>Object</code> \| <code>null</code> - An object of the form `{ path: string, gz: boolean }`if the file was found, null otherwise
**Returns**: <code>Object</code> \| <code>null</code> - An object of the form `{ path: string, gz: boolean }`
if the file was found, null otherwise

| Param | Type | Default | Description |
| --- | --- | --- | --- |
Expand All @@ -157,7 +161,9 @@ Looks up an existing data xml on disk
Looks up the xml files on disk for a given version

**Kind**: inner method of [<code>dataManager</code>](#module_dataManager)
**Returns**: <code>Array.&lt;(Object\|null)&gt;</code> - An array of results for each type:An object of the form `{ path: string, gz: boolean }` if the file was found,null otherwise
**Returns**: <code>Array.&lt;(Object\|null)&gt;</code> - An array of results for each type:
An object of the form `{ path: string, gz: boolean }` if the file was found,
null otherwise

| Param | Type | Default | Description |
| --- | --- | --- | --- |
Expand Down Expand Up @@ -191,10 +197,12 @@ Download data dumps and show download progress
<a name="module_fetcher..ensureDump"></a>

### fetcher~ensureDump(version, collection, [showProgress], [dataDir]) ⇒ <code>Promise.&lt;void&gt;</code>
Ensures a data dump file is downloaded to ./data/<version>/. Doesnothing if a file already exists. Does not verify the file.
Ensures a data dump file is downloaded to ./data/<version>/. Does
nothing if a file already exists. Does not verify the file.

**Kind**: inner method of [<code>fetcher</code>](#module_fetcher)
**Returns**: <code>Promise.&lt;void&gt;</code> - A Promise that completes when all data isdownloaded
**Returns**: <code>Promise.&lt;void&gt;</code> - A Promise that completes when all data is
downloaded

| Param | Type | Default | Description |
| --- | --- | --- | --- |
Expand All @@ -206,10 +214,12 @@ Ensures a data dump file is downloaded to ./data/<version>/. Does nothing if a f
<a name="module_fetcher..ensureDumps"></a>

### fetcher~ensureDumps(version, [collections], [showProgress], [dataDir]) ⇒ <code>Promise.&lt;void&gt;</code>
Ensures all the specified collections of a specific data dump version aredownloaded to the given data directory
Ensures all the specified collections of a specific data dump version are
downloaded to the given data directory

**Kind**: inner method of [<code>fetcher</code>](#module_fetcher)
**Returns**: <code>Promise.&lt;void&gt;</code> - A Promise that completes when all data isdownloaded
**Returns**: <code>Promise.&lt;void&gt;</code> - A Promise that completes when all data is
downloaded

| Param | Type | Default | Description |
| --- | --- | --- | --- |
Expand All @@ -233,7 +243,8 @@ Ensures that the CHECKSUM file for a given version is downloaded
<a name="module_processing/dumpFormatter"></a>

## processing/dumpFormatter
Helpers to transform on the dumps parsed by XMLParser into plain objectsthat are easier to work with.
Helpers to transform on the dumps parsed by XMLParser into plain objects
that are easier to work with.


* [processing/dumpFormatter](#module_processing/dumpFormatter)
Expand All @@ -245,7 +256,8 @@ Helpers to transform on the dumps parsed by XMLParser into plain objects that ar
<a name="module_processing/dumpFormatter..formatLabel"></a>

### processing/dumpFormatter~formatLabel(label, [includeImageObjects]) ⇒ <code>object</code>
Format a label tag. See readme.md for information of how the data istransformed
Format a label tag. See readme.md for information of how the data is
transformed

**Kind**: inner method of [<code>processing/dumpFormatter</code>](#module_processing/dumpFormatter)

Expand All @@ -257,7 +269,8 @@ Format a label tag. See readme.md for information of how the data is transformed
<a name="module_processing/dumpFormatter..formatArtist"></a>

### processing/dumpFormatter~formatArtist(artist, [includeImageObjects]) ⇒ <code>object</code>
Format an artist tag. See readme.md for information of how the data istransformed
Format an artist tag. See readme.md for information of how the data is
transformed

**Kind**: inner method of [<code>processing/dumpFormatter</code>](#module_processing/dumpFormatter)

Expand All @@ -269,7 +282,8 @@ Format an artist tag. See readme.md for information of how the data is transform
<a name="module_processing/dumpFormatter..formatMaster"></a>

### processing/dumpFormatter~formatMaster(master, [includeImageObjects]) ⇒ <code>object</code>
Format a master tag. See readme.md for information of how the data istransformed
Format a master tag. See readme.md for information of how the data is
transformed

**Kind**: inner method of [<code>processing/dumpFormatter</code>](#module_processing/dumpFormatter)

Expand All @@ -281,7 +295,8 @@ Format a master tag. See readme.md for information of how the data is transforme
<a name="module_processing/dumpFormatter..formatRelease"></a>

### processing/dumpFormatter~formatRelease(release, [includeImageObjects]) ⇒ <code>object</code>
Format a release tag. See readme.md for information of how the data istransformed
Format a release tag. See readme.md for information of how the data is
transformed

**Kind**: inner method of [<code>processing/dumpFormatter</code>](#module_processing/dumpFormatter)

Expand All @@ -293,7 +308,8 @@ Format a release tag. See readme.md for information of how the data is transform
<a name="module_processing/processor"></a>

## processing/processor
Parse the data dump XML into plain JS objects and process them witha given function. See readme.md for an example
Parse the data dump XML into plain JS objects and process them with
a given function. See readme.md for an example


* [processing/processor](#module_processing/processor)
Expand All @@ -304,7 +320,10 @@ Parse the data dump XML into plain JS objects and process them with a given func
<a name="module_processing/processor..processDumpFile"></a>

### processing/processor~processDumpFile(path, collection, fn, [gz], [chunkSize], [restart]) ⇒ <code>Promise</code>
Processes an XML dump file using `node-expat` into plain objects. Every`chunkSize` rows the parser will pause and pass the result to the `fn`function. Once the `fn` function completes, parsing continues until theentire file is parsed.
Processes an XML dump file using `node-expat` into plain objects. Every
`chunkSize` rows the parser will pause and pass the result to the `fn`
function. Once the `fn` function completes, parsing continues until the
entire file is parsed.

**Kind**: inner method of [<code>processing/processor</code>](#module_processing/processor)
**Returns**: <code>Promise</code> - A Promise that resolves when processing is complete
Expand All @@ -319,11 +338,23 @@ Processes an XML dump file using `node-expat` into plain objects. Every `chunkSi
| [restart] | <code>boolean</code> | <code>false</code> | By default, the processing progress is stored in a `.processing` file alongside the data dumps. If the processing is stopped, it will continue from that row once you call `processDumpFile` again. Set this to `true` to always start from the beginning. |

**Example**
```processDumpFile( './discogs_20190101_artists.xml.gz', 'artists', chunk => { // process the results here. For this example, we just console.log them chunk.forEach(row => console.log(row)); return Promise.resolve(); });```
```
processDumpFile(
'./discogs_20190101_artists.xml.gz',
'artists',
chunk => {
// process the results here. For this example, we just console.log them
chunk.forEach(row => console.log(row));

return Promise.resolve();
}
);
```
<a name="module_processing/processor..processDumps"></a>

### processing/processor~processDumps(version, fn, [collections], [chunkSize], [restart], [dataDir]) ⇒ <code>Promise.&lt;void&gt;</code>
Looks up the downloaded data dumps of a given version. Then calls `processDumpFile`on each of them.
Looks up the downloaded data dumps of a given version. Then calls `processDumpFile`
on each of them.

**Kind**: inner method of [<code>processing/processor</code>](#module_processing/processor)
**See**: processDumpFile
Expand Down Expand Up @@ -377,7 +408,13 @@ Runs parseInt and errors when the result is NaN
<a name="module_util/parseUtils..parseDiscogsName"></a>

### util/parseUtils~parseDiscogsName(name, target) ⇒ <code>object</code>
Parses a name from Discogs that potentially has a "(n)" numeric postfix.Stores the result on the specified target object. Will set the followingproperties:name: the name with the "(n)" postfix removed \originalName: the name without modifications \nameIndex: the number n inside the postfix. 1 if there isn't any
Parses a name from Discogs that potentially has a "(n)" numeric postfix.
Stores the result on the specified target object. Will set the following
properties:

name: the name with the "(n)" postfix removed \
originalName: the name without modifications \
nameIndex: the number n inside the postfix. 1 if there isn't any

**Kind**: inner method of [<code>util/parseUtils</code>](#module_util/parseUtils)
**Returns**: <code>object</code> - A reference to target
Expand All @@ -390,7 +427,10 @@ Parses a name from Discogs that potentially has a "(n)" numeric postfix. Stores
<a name="module_util/parseUtils..parseDuration"></a>

### util/parseUtils~parseDuration(duration, target) ⇒ <code>object</code>
Parses the duration string from a Discogs XML file and stores the resulton the target object. Will store the string as-is on the 'originalDuration'property. If the duration is formatted somewhat correctly, will calculate theduration in number of seconds and store it on the 'duration' property.
Parses the duration string from a Discogs XML file and stores the result
on the target object. Will store the string as-is on the 'originalDuration'
property. If the duration is formatted somewhat correctly, will calculate the
duration in number of seconds and store it on the 'duration' property.

**Kind**: inner method of [<code>util/parseUtils</code>](#module_util/parseUtils)
**Returns**: <code>object</code> - `target` for chaining
Expand All @@ -403,7 +443,11 @@ Parses the duration string from a Discogs XML file and stores the result on the
<a name="module_util/parseUtils..parseReleaseDate"></a>

### util/parseUtils~parseReleaseDate(date, &#x60;target&#x60;)
Will parse the given release date and format it according to DiscogsDatabase Guidelines. The result is stored on the "released" property of thetarget object. The date will be either formatted as YYYY or YYYY-MM-DD.If only the year and month are given, the date will be set to 00. If dashesare missing, they will be added. All other formats are discarded.
Will parse the given release date and format it according to Discogs
Database Guidelines. The result is stored on the "released" property of the
target object. The date will be either formatted as YYYY or YYYY-MM-DD.
If only the year and month are given, the date will be set to 00. If dashes
are missing, they will be added. All other formats are discarded.

**Kind**: inner method of [<code>util/parseUtils</code>](#module_util/parseUtils)

Expand Down
6 changes: 3 additions & 3 deletions bucket.js
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ const logger = require("./util/logger");

const parseString = promisify(xml2js.parseString);

const BUCKET_URL = "https://discogs-data.s3-us-west-2.amazonaws.com";
const BUCKET_URL = "https://discogs-data-dumps.s3-us-west-2.amazonaws.com";
const S3B_ROOT_DIR = "data/";

/**
Expand All @@ -23,7 +23,7 @@ const S3B_ROOT_DIR = "data/";
* @returns {string}
*/
function getDumpURL(version, collection) {
return `https://discogs-data.s3-us-west-2.amazonaws.com/data/${version.substring(
return `${BUCKET_URL}/data/${version.substring(
0,
4
)}/discogs_${version}_${collection}.xml.gz`;
Expand All @@ -35,7 +35,7 @@ function getDumpURL(version, collection) {
* @returns {string}
*/
function getChecksumURL(version) {
return `https://discogs-data.s3-us-west-2.amazonaws.com/data/${version.substring(
return `${BUCKET_URL}/data/${version.substring(
0,
4
)}/discogs_${version}_CHECKSUM.txt`;
Expand Down
2 changes: 1 addition & 1 deletion cli/mongo.js
Original file line number Diff line number Diff line change
Expand Up @@ -247,7 +247,7 @@ async function main(argv, client) {
updateOne: {
filter: { id: doc.id },
upsert: true,
update: doc
update: { $set: doc }
}
}))
);
Expand Down
Loading