Skip to content

deps is inconsistent between dvc.yaml and .dvc files #5370

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
antonkulaga opened this issue Jan 31, 2021 · 4 comments
Closed

deps is inconsistent between dvc.yaml and .dvc files #5370

antonkulaga opened this issue Jan 31, 2021 · 4 comments

Comments

@antonkulaga
Copy link

When I have something like:

    deps:
      - path: http://ftp.ensembl.org/pub/release-102/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
        etag: W/"5f8d62ed-34863818"
        cache: false

It works find in .dvc files, but when I add it to one of the stages, I get an error as it assumes deps are string only.

stages:
  prepare_genome:
    frozen: true
    deps:
      - path: http://ftp.ensembl.org/pub/release-102/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
        etag: W/"5f8d62ed-34863818"
        cache: false
    cmd: bin/prepare_genome.sh
    out:
      - data/gwas/homo_sapiens_ensembl_102/Homo_sapiens.GRCh38.dna.primary_assembly.fa

the error is:

 'dvc.yaml' format error: expected str @ data['stages']['prepare_genome']['deps'][0]

the dvc version is 1.11.10 (latest .deb file).
I suggest that "deps" should behave the same in .dvc and dvc.yaml as otherwise it creates a lot of confusion.

@skshetry
Copy link
Collaborator

skshetry commented Jan 31, 2021

@antonkulaga, the old .dvc file support for stage is going to be removed in the upcoming release next month.

.dvc files for stages were deprecated since the introduction of dvc.yaml file in 1.0, and are only supported for dvc add/import data files.

@antonkulaga
Copy link
Author

@skshetry I was not talking about having stages in .dvc files, but having:

deps:
      - path: http://ftp.ensembl.org/pub/release-102/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
        etag: W/"5f8d62ed-34863818"
        cache: false

inside of the dependencies in dvc.yaml So far only "str" is supported for deps in dvc.yaml, I believe it should also support etags for external dependencies

skshetry added a commit to skshetry/dvc that referenced this issue Feb 1, 2021
This was loosened when introducing parametrization.
But, `foreach`..`do` should be considered first
before stage's regular structure. And, cmd is made
required (as it should have been)

Related: iterative#5371, iterative#5370, iterative#5312
skshetry added a commit that referenced this issue Feb 1, 2021
* don't allow loading stages without `cmd`

This was loosened when introducing parametrization.
But, `foreach`..`do` should be considered first
before stage's regular structure. And, cmd is made
required (as it should have been)

Related: #5371, #5370, #5312

* fix schema
@skshetry
Copy link
Collaborator

skshetry commented Feb 1, 2021

@antonkulaga, the checksums are kept in lockfile. You just have to write the entry as follows and then dvc repro it later:

stages:
  download_file:
    cmd: curl https://github.com/api/repos/iterative/dvc -s > data.json
    deps:
    - https://github.com/api/repos/iterative/dvc
    metrics:
    - data.json

cache: false were never honoured for deps and is not required.

@antonkulaga
Copy link
Author

@skshetry I think the problem is that in the docs you talk a lot about editing .dvc files and barely mention editing of dvc.lock
The other problem is that dvc import-url only allows creating .dvc files while many teams prefer to keep everything in dvc.yaml+dvc.lock combination. I am closing this issue in favor of a new issue about additing a "store in dvc.lock" flag in dvc import-url #5379

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants