Skip to content

pipeline file: characters to allow in stage name #3700

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
skshetry opened this issue Apr 29, 2020 · 9 comments · Fixed by #3767
Closed

pipeline file: characters to allow in stage name #3700

skshetry opened this issue Apr 29, 2020 · 9 comments · Fixed by #3767
Assignees

Comments

@skshetry
Copy link
Collaborator

skshetry commented Apr 29, 2020

Should we restrict stage name to certain characters? Maybe [a-z][A-Z][0-9]{hyphen, underscore}.

YAML keys can contain any thing, even spaces.

Allowing every kind of characters might create strange errors (which we might be unaware of).
If we disallow, what characters should we disallow?

@ghost ghost added the triage Needs to be triaged label Apr 29, 2020
@skshetry skshetry self-assigned this Apr 29, 2020
@skshetry skshetry mentioned this issue Apr 29, 2020
3 tasks
@ghost ghost removed the triage Needs to be triaged label Apr 29, 2020
@efiop
Copy link
Contributor

efiop commented Apr 29, 2020

The stricter - the better. We can always ease up on it.

@michael-ford
Copy link

Hey DVC team, not sure if this is the best place to do this, but as a DVC user I'd like to cast a vote for including periods as allowed characters. They are allowed in filenames in all modern OS that would run DVC, and while aren't the most common punctuation in file systems, are widely used.

I bring this up because I'm converting from 0.94 to 1.0+ and I commonly use periods in the names of .dvc files, which are now becoming stage names. Manually adding them to a dvc.yaml works, as does dvc repro, but dvc run complains.

The only compromise I can see (without being familiar with DVC implementation) is that periods are used in REGEX, but I generally never run into issues. Like I said dvc repro works fine with periods. Not sure if there are any other downsides though.

If you agree, I could potentially get my hands dirty and send a pull request. If one user's vote isn't enough, maybe in time others will chime in here as well.

@efiop
Copy link
Contributor

efiop commented Sep 11, 2020

@skshetry what do you think? ^

@skshetry
Copy link
Collaborator Author

Manually adding them to a dvc.yaml works, as does dvc repro, but dvc run complains

@michael-ford, yes, you are right that we only complain on dvc run. It's unlikely that we are going to start enforcing them.

You just have to add "." (period) if you want to in the following line, but I'll suggest you slugify them instead if you are migrating.

INVALID_STAGENAME_CHARS = set(string.punctuation) - {"_", "-"}

We are working on new features that will affect the dvc.yaml and we'll be relaxing some of the restrictions and/or enforcing them if we use these characters for some features. However, the period on dvc run won't be affected, and should be safe to use as-is, irrespective of the change on INVALID_STAGENAME_CHARS.

@efiop
Copy link
Contributor

efiop commented Sep 14, 2020

@skshetry Should we reopen/create new issue?

@michael-ford
Copy link

@skshetry thanks for the reply. From what I understand:

  • obviously easy to add, but does your message mean you're not interested in changing it / PR? Meaning I could do it on my own fork if I want
  • Allowing periods (with the exception of dvc run) will continue to work in the near-future updates

@skshetry
Copy link
Collaborator Author

@michael-ford, I am a bit conflicted about it, and unsure about what stage name even means or should mean. We disallowed stage name to not contain periods because most of the files for the DVC tracked dependencies and outputs will have one and might conflict during some operations (eg: push/pull/fetch/checkout; there's already a conflict if the files don't have any extensions).

Therefore, we decided to not support creating it but didn't go one step further to enforce it as that might mean users are generating themselves and will at least try to separate the stage name and name of the outputs. Another reason to not enforce is that it might be annoying to see an error during checkouts, for example. So, it will continue working in the future.

I can see it being useful with autocompletion, but we are moving away from stage name to the outputs name for most of the commands (repro does not yet) for the same reasons.

Another reason, that is already mentioned, we are working on the next updates with dvc.yaml that might change things here and might add some clarifications to it. So, I am a bit hesitant to support to change it yet.

Saying that we are open to relax the restriction, for which we'd love to hear what the pros/cons of this rule for you are, how will you ensure they don't conflict, and what DVC can do to prevent those conflicts. That'd help us make a better decision around this. 🙂

@skshetry
Copy link
Collaborator Author

@skshetry Should we reopen/create new issue?

Reopening for further discussions.

@skshetry skshetry reopened this Sep 16, 2020
@skshetry
Copy link
Collaborator Author

Closing due to inactivity. @michael-ford, please feel free to reopen anytime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants