-
Notifications
You must be signed in to change notification settings - Fork 398
Add docs regaring --to-remote option for add/import-url #2091
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
f64f601
98cf237
114ba8d
cbdf546
820cbd6
4c83bdf
aaa0273
02f9ade
66c8710
c11ef07
0b79d10
2ea1f22
3ff3d01
a4cbe61
4fb63eb
d07166d
c249ee6
b16d407
570f38c
6c8a592
5737bd2
96d767f
133a939
6c7f65a
194a764
0dd63c7
8e66b2b
a473848
e5b9d4e
d7ca231
65ce340
c6351f3
ee24963
89c1bb9
f32473e
1d5ef74
25b0cdf
c036a07
46b5164
d58af5b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9,7 +9,7 @@ Download a file or directory from a supported URL (for example `s3://`, | |
## Synopsis | ||
|
||
```usage | ||
usage: dvc get-url [-h] [-q | -v] [-j <number>] url [out] | ||
usage: dvc get-url [-h] [-q | -v] url [out] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oops @isidentical I'm seeing lots of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This file (content/docs/command-reference/get-url.md), content/docs/command-reference/get.md, and content/docs/command-reference/import.md to be precise. |
||
|
||
positional arguments: | ||
url (See supported URLs in the description.) | ||
|
@@ -31,7 +31,7 @@ while `out` can be used to specify the directory and/or file name desired for | |
the downloaded data. If an existing directory is specified, then the file or | ||
directory will be placed inside. | ||
|
||
DVC supports several types of (local or) remote data sources (protocols): | ||
DVC supports several types of (local or) remote locations (protocols): | ||
|
||
| Type | Description | `url` format example | | ||
| --------- | ---------------------------- | --------------------------------------------- | | ||
|
@@ -72,10 +72,6 @@ $ wget https://example.com/path/to/data.csv | |
|
||
## Options | ||
|
||
- `-j <number>`, `--jobs <number>` - parallelism level for DVC to download data | ||
from the source. The default value is `4 * cpu_count()`. For SSH remotes, the | ||
default is `4`. Using more jobs may speed up the operation. | ||
|
||
- `-h`, `--help` - prints the usage/help message, and exit. | ||
|
||
- `-q`, `--quiet` - do not write anything to standard output. Exit with 0 if no | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,8 @@ | ||
# import-url | ||
|
||
Download a file or directory from a supported URL (for example `s3://`, | ||
`ssh://`, and other protocols) into the <abbr>workspace</abbr>, and track it (an | ||
import `.dvc` file is created). | ||
Track a file or directory found in an external location (`s3://`, `/local/path`, | ||
etc.), and download it to the local project, or make a copy in | ||
[remote storage](/doc/command-reference/remote). | ||
|
||
> See `dvc import` to download and tack data/model files or directories from | ||
> other <abbr>DVC repositories</abbr> (e.g. hosted on GitHub). | ||
|
@@ -11,7 +11,8 @@ import `.dvc` file is created). | |
|
||
```usage | ||
usage: dvc import-url [-h] [-q | -v] [-j <number>] [--file <filename>] | ||
[--no-exec] [--desc <text>] | ||
[--no-exec] [--to-remote] [-r <name>] | ||
[--desc <text>] | ||
url [out] | ||
|
||
positional arguments: | ||
|
@@ -22,8 +23,9 @@ positional arguments: | |
## Description | ||
|
||
In some cases it's convenient to add a data file or directory from an external | ||
location into the workspace, such that it can be updated later, if/when the | ||
external data source changes. Example scenarios: | ||
location into the workspace (or to | ||
[remote storage](/doc/command-reference/remote)), such that it can be updated | ||
later, if/when the external data source changes. Example scenarios: | ||
|
||
- A remote system may produce occasional data files that are used in other | ||
projects. | ||
|
@@ -37,6 +39,12 @@ external data source changes. Example scenarios: | |
having to manually copy files from the supported locations (listed below), which | ||
may require installing a different tool for each type. | ||
|
||
When you don't want to store the target data in your local system, you can still | ||
create an import `.dvc` file while transferring a file or directory directly to | ||
remote storage, by using the `--to-remote` option. See the | ||
[Transfer to remote storage](#example-transfer-to-remote-storage) example for | ||
more details. | ||
|
||
The `url` argument specifies the external location of the data to be imported. | ||
The imported data is <abbr>cached</abbr>, and linked (or copied) to the current | ||
working directory with its original file name e.g. `data.txt` (or to a location | ||
|
@@ -131,6 +139,15 @@ $ dvc run -n download_data \ | |
finish the operation(s)); or if the target data already exist locally and you | ||
want to "DVCfy" this state of the project (see also `dvc commit`). | ||
|
||
- `--to-remote` - import an external target, but don't move it into the | ||
workspace, nor cache it. [Transfer](#example-import-straight-to-the-remote) it | ||
jorgeorpinel marked this conversation as resolved.
Show resolved
Hide resolved
|
||
directly to remote storage (the default one, unless `-r` is specified) | ||
instead. Use `dvc pull` to get the data locally. | ||
|
||
- `-r <name>`, `--remote <name>` - name of the | ||
[remote storage](/doc/command-reference/remote) (can only be used with | ||
`--to-remote`). | ||
|
||
- `-j <number>`, `--jobs <number>` - parallelism level for DVC to download data | ||
from the source. The default value is `4 * cpu_count()`. For SSH remotes, the | ||
default is `4`. Using more jobs may speed up the operation. | ||
|
@@ -340,3 +357,47 @@ $ dvc repro | |
Running stage 'prepare' with command: | ||
python src/prepare.py data/data.xml | ||
``` | ||
|
||
## Example: Transfer to remote storage | ||
|
||
When you have a large dataset in an external location, you may want to import it | ||
to you project without downloading it to the local file system (for using it | ||
later/elsewhere). The `--to-remote` option lets you skip the download, while | ||
Comment on lines
+361
to
+365
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And copy over the Example as well (will need some some adapting). Thanks |
||
storing the imported data [remotely](/doc/command-reference/remote). Let's | ||
initialize a DVC project, and setup a remote: | ||
|
||
```dvc | ||
$ mkdir example # workspace | ||
$ cd example | ||
$ git init | ||
$ dvc init | ||
$ mkdir /tmp/dvc-storage | ||
$ dvc remote add myremote /tmp/dvc-storage | ||
``` | ||
|
||
Now let's create an import `.dvc` file without downloading the target data, | ||
transferring it directly to remote storage instead: | ||
|
||
``` | ||
jorgeorpinel marked this conversation as resolved.
Show resolved
Hide resolved
|
||
$ dvc import-url https://data.dvc.org/get-started/data.xml data.xml \ | ||
--to-remote -r myremote | ||
... | ||
``` | ||
|
||
The only change in our local <abbr>workspace</abbr> is a newly created import | ||
`.dvc` file: | ||
|
||
```dvc | ||
$ ls | ||
data.xml.dvc | ||
``` | ||
|
||
Whenever anyone wants to actually download the imported data (for example from a | ||
system that can handle it), they can use `dvc pull` as usual: | ||
|
||
``` | ||
$ dvc pull data.xml.dvc -r tmp_remote | ||
|
||
A data.xml | ||
1 file added and 1 file fetched | ||
``` |
Uh oh!
There was an error while loading. Please reload this page.