Description
Hello!
I hope the explanation below is clear. Please let me know otherwise.
Say I have a utils.py
with some awesome helpful classes that I reuse frequently in a certain dvc tracked repo.
Say I have:
script1.py
that usesutils.py
and that takesdata0.csv
and processes it todata1.csv
script2.py
that usesutils.py
and that takesdata1.csv
and processes it todata2.csv
script3.py
that usesutils.py
and that takesdata2.csv
and processes it todata3.csv
- etc ...
(In the example above all scripts are part of the same pipeline but it they could be from different pipelines.)
The point that perhaps could be improved is that, as far as I know, for each data*.csv
I have to add to its dependencies the correspondent script and utils.py
. And of course that this can cascade if utils.py
depends on utils1.py
which depends on utils2.py
, etc... If that is the case, then I have to remember to, every time utils.py
is a dependency, to include the others utils*.py
as dependencies as well.
Is there a way in the dvc to say that a scriptB.py
depends on scriptA.py
so that every time scriptB.py
is a dependency, then scriptA.py
is also an implicit dependency?
Like a variant or an alternative to dvc run
where the "output" is not a data file but a .py
?
Thanks is advance!