Skip to content

pipelines: parametrize using environment variables / DVC properties #1416

Open
@prihoda

Description

@prihoda

It would be useful if I could parametrize my pipeline using environment variables, which could be read from a properties file specified using dvc config env my.properties. DVC would load those environment variables when running the command.

For example, I could have this properties file:

DVC_NICKNAME=David

And run:

dvc run -o hello.txt 'echo "Hello ${DVC_NICKNAME}!" > hello.txt'
dvc run -o cheers.txt 'echo "Cheers ${DVC_NICKNAME}!" > cheers.txt'

And produce "Hello David!" and "Cheers David!" files.

Users would just have to make sure to quote the command or use interactive mode #1415.

The DVC file would contain the variable reference:

cmd: echo "Hello ${DVC_NICKNAME}!" > hello.txt

The value would be added to the environment by DVC at DVC startup so it would be handled natively by the shell.

In order for dvc status to be able to detect that variables in a stage changed, we can calculate the internal md5 checksum on contents with the variable values injected in place of the variable names, so that it would be handled as if the contents of the DVC file changed. This can be done using os.path.expandvars. But unfortunately, this would just replace variable references used directly in the shell command, it would not cover cases where you're using the environment variable inside a script. The only foolproof way would be force the user to explicitly request environment variables that would be injected from the properties file, e.g. using dvc run -e DVC_NICKNAME -e DVC_OTHER. That would basically allow adding additional "env dependencies" to stages.

It would be nice to inject the variables also into paths to dependencies, so that you can parametrize those as well. Could also be done using os.path.expandvars. This would change the DAG dynamically, but AFAIK it should actually magically work without breaking anything, right? As long as you just initialize the environment at each DVC startup and call expandvars when reading deps paths.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A: paramsRelated to dvc paramsA: templatingRelated to the templating featurefeature requestRequesting a new featurep3-nice-to-haveIt should be done this or next sprint

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions