1
1
# import-url
2
2
3
- Download a file or directory from a supported URL (for example ` s3:// ` ,
4
- ` ssh:// ` , and other protocols) into the < abbr >workspace</ abbr >, and track it (an
5
- import ` .dvc ` file is created ).
3
+ Track a file or directory found in an external location ( ` s3:// ` , ` /local/path ` ,
4
+ etc.) , and download it to the local project, or make a copy in
5
+ [ remote storage ] ( /doc/command-reference/remote ) .
6
6
7
7
> See ` dvc import ` to download and tack data/model files or directories from
8
8
> other <abbr >DVC repositories</abbr > (e.g. hosted on GitHub).
@@ -11,7 +11,8 @@ import `.dvc` file is created).
11
11
12
12
``` usage
13
13
usage: dvc import-url [-h] [-q | -v] [-j <number>] [--file <filename>]
14
- [--no-exec] [--desc <text>]
14
+ [--no-exec] [--to-remote] [-r <name>]
15
+ [--desc <text>]
15
16
url [out]
16
17
17
18
positional arguments:
@@ -22,8 +23,9 @@ positional arguments:
22
23
## Description
23
24
24
25
In some cases it's convenient to add a data file or directory from an external
25
- location into the workspace, such that it can be updated later, if/when the
26
- external data source changes. Example scenarios:
26
+ location into the workspace (or to
27
+ [ remote storage] ( /doc/command-reference/remote ) ), such that it can be updated
28
+ later, if/when the external data source changes. Example scenarios:
27
29
28
30
- A remote system may produce occasional data files that are used in other
29
31
projects.
@@ -37,6 +39,12 @@ external data source changes. Example scenarios:
37
39
having to manually copy files from the supported locations (listed below), which
38
40
may require installing a different tool for each type.
39
41
42
+ When you don't want to store the target data in your local system, you can still
43
+ create an import ` .dvc ` file while transferring a file or directory directly to
44
+ remote storage, by using the ` --to-remote ` option. See the
45
+ [ Transfer to remote storage] ( #example-transfer-to-remote-storage ) example for
46
+ more details.
47
+
40
48
The ` url ` argument specifies the external location of the data to be imported.
41
49
The imported data is <abbr >cached</abbr >, and linked (or copied) to the current
42
50
working directory with its original file name e.g. ` data.txt ` (or to a location
@@ -131,6 +139,15 @@ $ dvc run -n download_data \
131
139
finish the operation(s)); or if the target data already exist locally and you
132
140
want to "DVCfy" this state of the project (see also ` dvc commit ` ).
133
141
142
+ - ` --to-remote ` - import an external target, but don't move it into the
143
+ workspace, nor cache it. [ Transfer] ( #example-import-straight-to-the-remote ) it
144
+ directly to remote storage (the default one, unless ` -r ` is specified)
145
+ instead. Use ` dvc pull ` to get the data locally.
146
+
147
+ - ` -r <name> ` , ` --remote <name> ` - name of the
148
+ [ remote storage] ( /doc/command-reference/remote ) (can only be used with
149
+ ` --to-remote ` ).
150
+
134
151
- ` -j <number> ` , ` --jobs <number> ` - parallelism level for DVC to download data
135
152
from the source. The default value is ` 4 * cpu_count() ` . For SSH remotes, the
136
153
default is ` 4 ` . Using more jobs may speed up the operation.
@@ -340,3 +357,47 @@ $ dvc repro
340
357
Running stage 'prepare' with command:
341
358
python src/prepare.py data/data.xml
342
359
` ` `
360
+
361
+ # # Example: Transfer to remote storage
362
+
363
+ When you have a large dataset in an external location, you may want to import it
364
+ to you project without downloading it to the local file system (for using it
365
+ later/elsewhere). The `--to-remote` option lets you skip the download, while
366
+ storing the imported data [remotely](/doc/command-reference/remote). Let's
367
+ initialize a DVC project, and setup a remote :
368
+
369
+ ` ` ` dvc
370
+ $ mkdir example # workspace
371
+ $ cd example
372
+ $ git init
373
+ $ dvc init
374
+ $ mkdir /tmp/dvc-storage
375
+ $ dvc remote add myremote /tmp/dvc-storage
376
+ ` ` `
377
+
378
+ Now let's create an import `.dvc` file without downloading the target data,
379
+ transferring it directly to remote storage instead :
380
+
381
+ ` ` `
382
+ $ dvc import-url https://data.dvc.org/get-started/data.xml data.xml \
383
+ --to-remote -r myremote
384
+ ...
385
+ ` ` `
386
+
387
+ The only change in our local <abbr>workspace</abbr> is a newly created import
388
+ `.dvc` file :
389
+
390
+ ` ` ` dvc
391
+ $ ls
392
+ data.xml.dvc
393
+ ` ` `
394
+
395
+ Whenever anyone wants to actually download the imported data (for example from a
396
+ system that can handle it), they can use `dvc pull` as usual :
397
+
398
+ ` ` `
399
+ $ dvc pull data.xml.dvc -r tmp_remote
400
+
401
+ A data.xml
402
+ 1 file added and 1 file fetched
403
+ ` ` `
0 commit comments