pipelines: parametrize using environment variables / DVC properties

It would be useful if I could parametrize my pipeline using environment variables, which could be read from a properties file specified using `dvc config env my.properties`. DVC would load those environment variables when running the command.

For example, I could have this properties file:
```
DVC_NICKNAME=David
```

And run:
```
dvc run -o hello.txt 'echo "Hello ${DVC_NICKNAME}!" > hello.txt'
dvc run -o cheers.txt 'echo "Cheers ${DVC_NICKNAME}!" > cheers.txt'
```
And produce "Hello David!" and "Cheers David!" files. 

Users would just have to make sure to quote the command or use interactive mode #1415.

The DVC file would contain the variable reference:
```
cmd: echo "Hello ${DVC_NICKNAME}!" > hello.txt
```

The value would be added to the environment by DVC at DVC startup so it would be handled natively by the shell.

In order for `dvc status` to be able to detect that variables in a stage changed, we can calculate the internal md5 checksum on contents with the variable values injected in place of the variable names, so that it would be handled as if the contents of the DVC file changed. This can be done using [os.path.expandvars](https://docs.python.org/3/library/os.path.html#os.path.expandvars). But unfortunately, this would just replace variable references used directly in the shell command, it would not cover cases where you're using the environment variable inside a script. The only foolproof way would be force the user to explicitly request environment variables that would be injected from the properties file, e.g. using `dvc run -e DVC_NICKNAME -e DVC_OTHER`. That would basically allow adding additional "env dependencies" to stages.

It would be nice to inject the variables also into paths to dependencies, so that you can parametrize those as well. Could also be done using [os.path.expandvars](https://docs.python.org/3/library/os.path.html#os.path.expandvars). This would change the DAG dynamically, but AFAIK it should actually magically work without breaking anything, right? As long as you just initialize the environment at each DVC startup and call expandvars when reading deps paths.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

pipelines: parametrize using environment variables / DVC properties #1416

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

pipelines: parametrize using environment variables / DVC properties #1416

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions