@@ -107,89 +107,122 @@ reflinks or hardlinks to put it in the workspace without copying. See
107
107
108
108
## Examples
109
109
110
- For using the ` dvc pull ` command, a remote storage must be defined. (See
111
- ` dvc remote add ` .) For an existing <abbr >project</abbr >, remotes are usually
112
- already set up and you can use ` dvc remote list ` to check them. To remember how
113
- it's done, and set a context for the example, let's define a default SSH remote:
110
+ Let's employ a simple <abbr >workspace</abbr > with some data, code, ML models,
111
+ pipeline stages, such as the <abbr >DVC project</abbr > created for the
112
+ [ Get Started] ( /doc/tutorials/get-started ) . Then we can see what happens with
113
+ ` dvc pull ` .
114
+
115
+ <details >
116
+
117
+ ### Click and expand to setup the project
118
+
119
+ Start by cloning our example repo if you don't already have it:
114
120
115
121
``` dvc
116
- $ dvc remote add -d r1 ssh://_username_@_host_/path/to/dvc/remote/storage
117
- $ dvc remote list
118
- r1 ssh://_username_@_host_/path/to/dvc/remote/storage
122
+ $ git clone https://github.com/iterative/example-get-started
123
+ $ cd example-get-started
119
124
```
120
125
121
- > DVC supports several
122
- > [ remote types] ( /doc/command-reference/remote/add#supported-storage-types ) .
126
+ </details >
123
127
124
- Having some images and other files in remote storage, we can pull all changed
125
- files from the current Git branch :
128
+ The workspace looks almost like in this
129
+ [ pipeline setup ] ( /doc/tutorials/pipelines ) :
126
130
127
131
``` dvc
128
- $ dvc pull --remote r1
132
+ .
133
+ ├── data
134
+ │ └── data.xml.dvc
135
+ ...
136
+ └── train.dvc
129
137
```
130
138
131
- We can download specific files that are < abbr >outputs</ abbr > of a specific
132
- DVC-file :
139
+ We can now just run ` dvc pull ` to download the most recent ` data/data.xml ` ,
140
+ ` model.pkl ` , and other DVC-tracked files into the < abbr >workspace</ abbr > :
133
141
134
142
``` dvc
135
- $ dvc pull data.zip.dvc
143
+ $ dvc pull
144
+
145
+ $ tree example-get-started/
146
+ example-get-started/
147
+ ├── data
148
+ │ ├── data.xml
149
+ │ ├── data.xml.dvc
150
+ ...
151
+ ├── model.pkl
152
+ └── train.dvc
136
153
```
137
154
138
- In this case we left off the ` --remote ` option, so it will have pulled from the
139
- default remote. The only files considered in this case are what is listed in the
140
- ` out ` field of the DVC-file ` targets ` .
155
+ We can download specific <abbr >outputs</abbr > of a single DVC-file:
156
+
157
+ ``` dvc
158
+ $ dvc pull train.dvc
159
+ ```
141
160
142
161
## Example: With dependencies
143
162
144
- Demonstrating the ` --with-deps ` option requires a larger example. First, assume
145
- a [ pipeline] ( /doc/command-reference/pipeline ) has been setup with these
163
+ > Please delete the ` .dvc/cache ` directory first (with ` rm -Rf .dvc/cache ` ) to
164
+ > follow this example if you tried the previous ones.
165
+
166
+ Our [ pipeline] ( /doc/command-reference/pipeline ) has been setup with these
146
167
[ stages] ( /doc/command-reference/run ) :
147
168
148
169
``` dvc
149
- $ dvc pipeline show
150
-
151
- data/Posts.xml.zip.dvc
152
- Posts.xml.dvc
153
- Posts.tsv.dvc
154
- Posts-test.tsv.dvc
155
- matrix-train.p.dvc
156
- model.p.dvc
157
- Dvcfile
170
+ $ dvc pipeline show evaluate.dvc
171
+ data/data.xml.dvc
172
+ prepare.dvc
173
+ featurize.dvc
174
+ train.dvc
175
+ evaluate.dvc
158
176
```
159
177
160
- Imagine the remote storage has been modified such that the data in some of these
161
- stages should be updated in the <abbr >workspace</abbr >.
178
+ Imagine the [ remote storage] ( /doc/command-reference/remote ) has been modified
179
+ such that the data in some of these stages should be updated in the
180
+ <abbr >workspace</abbr >.
162
181
163
182
``` dvc
164
- $ dvc status --cloud
165
-
166
- deleted: data/model.p
167
- deleted: data/matrix-test.p
168
- deleted: data/matrix-train.p
183
+ $ dvc status -c
184
+ deleted: data/features/test.pkl
185
+ deleted: data/features/train.pkl
186
+ deleted: model.pkl
187
+ ...
169
188
```
170
189
171
190
One could do a simple ` dvc pull ` to get all the data, but what if you only want
172
191
to retrieve part of the data?
173
192
174
193
``` dvc
175
- $ dvc pull --remote r1 -- with-deps matrix-train.p .dvc
194
+ $ dvc pull --with-deps featurize .dvc
176
195
177
- ... Do some work based on the partial update
196
+ ... Use the partial update, then pull the remaining data:
178
197
179
- $ dvc pull --remote r1 --with-deps model.p.dvc
198
+ $ dvc pull
199
+ Everything is up to date.
200
+ ```
180
201
181
- ... Pull the rest of the data
202
+ With the first ` dvc pull ` we specified a stage in the middle of this pipeline
203
+ (` featurize.dvc ` ) while using ` --with-deps ` . DVC started with that DVC-file and
204
+ searched backwards through the pipeline for data files to download. Later we ran
205
+ ` dvc pull ` to download all the remaining data files.
182
206
183
- $ dvc pull -- remote r1
207
+ ## Example: Download from specific remote storage
184
208
185
- Everything is up to date.
209
+ For using the ` dvc pull ` command, a remote storage must be defined. (See
210
+ ` dvc remote add ` .) For an existing <abbr >project</abbr >, remotes are usually
211
+ already set up and you can use ` dvc remote list ` to check them. To remember how
212
+ it's done, and set a context for the example, let's define a default SSH remote:
213
+
214
+ ``` dvc
215
+ $ dvc remote add -d r1 ssh://_username_@_host_/path/to/dvc/remote/storage
216
+ $ dvc remote list
217
+ r1 ssh://_username_@_host_/path/to/dvc/remote/storage
186
218
```
187
219
188
- With the first ` dvc pull ` we specified a stage in the middle of this pipeline
189
- (` matrix-train.p.dvc ` ) while using ` --with-deps ` . DVC started with that DVC-file
190
- and searched backwards through the pipeline for data files to download. Because
191
- the ` model.p.dvc ` stage occurs later, its data was not pulled.
220
+ > DVC supports several
221
+ > [ remote types] ( /doc/command-reference/remote/add#supported-storage-types ) .
192
222
193
- Then we ran ` dvc pull ` specifying the last stage, ` model.p.dvc ` , and its data
194
- was downloaded. Finally, we ran ` dvc pull ` with no flags to make sure that all
195
- data was already pulled with the previous commands.
223
+ To download DVC-tracked data from a specific DVC remote, use the ` --remote `
224
+ (` -r ` ) option of ` dvc pull ` :
225
+
226
+ ``` dvc
227
+ $ dvc pull --remote r1
228
+ ```
0 commit comments