Skip to content

Commit 8a7c086

Browse files
committed
guide: finish revising What is DVC?
rel #425 (comment)
1 parent 93cc607 commit 8a7c086

File tree

2 files changed

+30
-25
lines changed

2 files changed

+30
-25
lines changed
Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,10 @@
11
# Basic Concepts of DVC
22

3-
- **Cache directory**: Directory with all data files on a local hard drive or in
3+
DVC streamlines large data files and binary models into a single Git
4+
environment. This approach will not require storing binary files in your Git
5+
repository.
6+
7+
![](/img/flow-large.png) _DVC data management_
8+
9+
- **Local Cache**: Directory with all data files on a local hard drive or in
410
cloud storage, but not in the Git repository. See `dvc cache dir`.

content/docs/user-guide/what-is-dvc.md

Lines changed: 23 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -34,8 +34,8 @@ software!
3434
to be integrated into a Git repository history and never needs to recompute
3535
the results after a successful merge.
3636

37-
- **Experiment state** or state: Equivalent to a Git snapshot (all committed
38-
files). A Git commit hash, branch or tag name, etc. can be used as a
37+
- **Experiment State**: Equivalent to a Git snapshot (all committed files). A
38+
Git commit hash, branch or tag name, etc. can be used as a
3939
[reference](https://git-scm.com/book/en/v2/Git-Internals-Git-References) to an
4040
experiment state.
4141

@@ -51,34 +51,33 @@ software!
5151
[DVC-files](/doc/user-guide/dvc-files-and-directories) describing that data
5252
are stored in Git for DVC needs (to maintain pipelines and reproducibility).
5353

54-
- **Cloud storage** support: available complement to the core DVC features. This
55-
is how a data scientist transfers large data files or shares a GPU-trained
56-
model with those without GPUs available.
54+
- **Cloud storage**: Available addon to the core DVC features. Multiple
55+
providers are supported (Amazon S3, Microsoft Azure Blob Storage, Google Cloud
56+
Storage, etc.). This is how a data scientist transfers large data files or
57+
shares a GPU-trained model with others.
58+
59+
> This complement is separate from DVC itself, and never required.
5760
5861
## Core Features
5962

60-
- DVC works **on top of Git repositories** and has a similar command line
61-
interface and Git workflow.
63+
- **Large data file tracking** is enabled, by creating special files that point
64+
to the original data (in the <abbr>cache</abbr>). These can be easily
65+
versioned with Git.
6266

63-
- It makes data science projects **reproducible** by creating lightweight
64-
[pipelines](/doc/command-reference/pipeline) using implicit dependency graphs.
67+
- DVC works **on top of Git repositories** and has a similar command line
68+
interface and flow as Git. DVC can also work stand-alone, but without
69+
versioning capabilities.
6570

66-
- **Large data file versioning** works by creating special files in your Git
67-
repository that point to the <abbr>cache</abbr>, typically stored on a local
68-
hard drive.
71+
- DVC makes data science projects **reproducible** by creating lightweight
72+
[pipelines](/doc/command-reference/pipeline), using implicit dependency
73+
graphs.
6974

70-
- DVC is **Programming language agnostic**: Python, R, Julia, shell scripts,
71-
etc. as well as ML library agnostic: Keras, Tensorflow, PyTorch, Scipy, etc.
75+
- DVC is **platform agnostic**: It runs on all major operating systems (Linux,
76+
MacOS, and Windows), and works independently of the programming languages
77+
(Python, R, Julia, shell scripts, etc.) or ML libraries (Keras, Tensorflow,
78+
PyTorch, Scipy, etc.) used in the <abbr>project</abbr>.
7279

73-
- It's **Open-source** and **Self-serve**: DVC is free and doesn't require any
80+
- **Open-source** and **Self-serve**: DVC is free and doesn't require any
7481
additional services.
7582

76-
- DVC supports cloud storage (Amazon S3, Microsoft Azure Blob Storage, Google
77-
Cloud Storage, etc.) for **data sources and pre-trained model sharing**.
78-
79-
DVC streamlines large data files and binary models into a single Git environment
80-
and this approach will not require storing binary files in your Git repository.
81-
The diagram below describes all the DVC commands and relationships between a
82-
local cache and remote storage:
83-
84-
![](/img/flow-large.png) _DVC data management_
83+
> Cloud storage providers are supported, however.

0 commit comments

Comments
 (0)