-
-
Notifications
You must be signed in to change notification settings - Fork 3k
Description
Inspired by #24510 , this proposal aims to be the minimal CLI implementation mentioned there. Forking off of that issue for more isolated discussion, this stdlib CLI parser should have a generally useful API regardless of its possible inclusion in "juicy main".
This CLI parser proposal is:
- minimal
- opinionated
- appeals to the general intuition of the Python
argparse
family (rather than Goflag
or Bashgetopt
)
The configuration is a struct definition:
const Args = struct {
named: struct {
// named parameter config goes here.
},
positional: []const []const u8,
};
The front-door API looks like this:
// std.cli
pub fn parse(comptime Args: type, allocator: Allocator, iter: std.process.ArgIterator, options: Options) Error!Args {
// ...
}
/// argv supports [][:0]u8, []const []const u8, etc.
pub fn parseSlice(comptime Args: type, allocator: Allocator, argv: anytype, options: Options) Error!Args {
// ...
}
pub const Options = struct {
/// Recognize --help, which will return error.Help.
help: bool = true,
/// error.InvalidArgument and error.Help will also print usage information to stderr.
print_errors: bool = true,
};
pub const Error = error{
/// Includes unrecognized option names and values that cannot be parsed into the desired value.
InvalidArgument,
/// The --help argument was given.
Help,
} || Allocator.Error;
For each field of Args.named
, the field name determines the --name
recognized during parsing, and the field type and possible default value determine parsing behavior.
- Field names are prefixed by
--
, never single-
and never/
. This is true even if you name a field with a single letter, e.g.--n=100
. Motivation: sometimes single-
means that multiple single-letter options can be grouped together, likels -lA
, but double--
never has this ambiguity; although/
-prefixed names are common on Windows,-
-prefixed names are also common, and CLI users will just need to deal with it. - Field names are used verbatim; no translation between
-
and_
(and no Unicode normalization). You can use@"field-name"
if you really want those hyphens in there. Motivation: lower complexity with respect to name mangling; there is no name mangling. - Any field that does not have a default value is required to be supplied on the command line. Alternate proposal: all struct fields must have a default value. Motivation for supporting required options: it's useful for e.g.
--output
options, and it semantically matches struct initialization in Zig. - Providing the same (scalar) argument multiple times overrides previous values with later values. Alternate proposal: the same (scalar) argument multiple times is an error. Motivation for override behavior: it's useful and matches the behavior that people are used to, e.g. your
git
alias might include--color=auto
then you can additionally give--color=never
to override it. - A lone double hyphen
--
stops recognizing hyphen-prefixed arguments for the rest of the args array. Before that, any hyphen-prefixed argument must be a recognized option name, if even if it's single-hyphen prefixed, which will never match anything. - A parameterless option
--help
is generated (unlessoptions.help
isfalse
) that prints the help (unlessoptions.print_errors
isfalse
) and returns an error. (Note that it is not possible to access the///
doc comments at compile time, but that would be a cool way of giving documentation on options, wouldn't it.) Any usage error will result in an error printed that concisely describes the error and a prompt to use--help
for more info; usage errors will not print the full usage. Alternate proposal: also include-h
as an alias, like in Python'sargparse
. Motivation for excluding-h
: this system doesn't do any single-hyphen short aliases. The result of giving-h
would be something like-h not reocngized. try --help
, which still indirectly gives the user what they were looking for. Alternate proposal: remove the options and have--help
print unconditionally. - Either a space or an equal sign can separate the name and the value, e.g.
--name value
or--name=value
. (Any literal=
in a field name, e.g.@"conf=usion"
would cause a compile error.) Motivation for space separation: shell tab-completion works best on space-separated tokens, e.g. for file paths. Motivation for equals-separated: when constructing anargs
array to launch a child process, a single append call possibly including string concatenation is simpler than two append calls or an extend call with two items. Remember that this is a consideration for all programming languages that might call a Zig executable, not just relevant from within Zig code. Additional motivation for equals: the relationship between names and values is self-documenting, e.g.--a=b c d
is more self-documenting than--a b c d
. - Parsing integers would be done with
std.fmt.parseInt()
with base0
to support0x
prefixes and such. Parsing floats would be done withstd.fmt.parseFloat()
, which means hex floats,NaN
,-Inf
, etc. would be supported. (And parsing strings would also be trivially supported.)
Departing from the "minimal" zone, but features I think are still important:
- If a field type is
bool
, then parameterless--name
and--no-name
are generated to turn the option on and off respectively. All field names are forbidden to start with@"no-"
to avoid possible collisions with generated names. Alternate proposal:--name=true
and--name=false
. Motivation for--name
and--no-name
: despite higher complexity, parameterless boolean options are more familiar, e.g.--verbose
and--force
forgit push
ormv
. Additionally, it's common to want a boolean parameter in a CLI to include the wordno
, e.g.--no-clobber
, and a negatively-named boolean is unnecessarily confusing; the code would read "if not no clobber". - Parsing enums would match the name of the enum, never the integer value. Motivation: simple.
- Parsing slices
[]const T
(other than[]const u8
) would mean that multiple of the same option appends to the slice, e.g.--exclude=".git" --exclude=".DS_Store"
. A field type[]const bool
would be a compile error, because it's not clear how that should be supported. Motivation for array options: it's a common feature, e.g.grep -e
,gcc -I
,kubectl get -l
. Motivation against array options: it's significantly more complex to implement than scalar values. - Parsing optional types
?T
would not be supported, always a compile error. It could be useful to initialize optional fields tonull
to track whether they ever get provided on the command line, but I believe this is a misfeature. I believe it should always be possible to override previous values to restore an option to its default value, even if that means designating special values like-1
or""
for this purpose. Considergit --color=auto
can explicitly override--color=always
, which is more useful thanauto
being the behavior only when--color
is never specified.
UPDATE
thanks for the discussions everyone! here are some responses to your contributions:
- Sub commands: there are many ways to do sub commands / sub parsers, and it's definitely out of scope for a minimal API. Compared to all the other features in the above proposal, sub commands are profoundly more complex (nesting the API within itself is a fairly obvious approach, which is definitely not minimal, and then it leads to all sorts of questions like whether parent arguments can be provided from within a sub command, etc.). We want people to have sub command CLI parsing, but it doesn't belong in the minimal API. (foreshadowing)
--help
docs: every proposal for giving textual help has been very reasonable, and thanks yall for the suggestions. So far, I don't think any of the ideas qualify as minimal, and it seems like the idea of having struct-field-driven configuration simply can't have help docs for options in the most obvious, minimal, way, which is using the///
doc comments for the fields; it's not allowed by the Zig type system (intentionally to discourage DSL's in doc comments for meta programming). how to give a--help
string still seems unresolved. let's keep discussing it.Build.option
: this minimal CLI parser is incompatible withBuild.option
because of thedescription_raw
parameter not being supported here. Although having multiple ways to do the same-ish thing is frustrating, I don't see a path for unifying the two systems. This concern is still unresolved.- user-defined
parseCLI
extensibility: definitely not minimal, but a cool idea. (second foreshadowing)
There's an important idea I didn't include above that's worth articulating: This minimal API gives users a path to migrate to a more advanced API. This API has intentional limitations that are compile errors. For example, declaring a named argument with a struct type is not allowed; that could suggest that you want a sub command or a custom parser or something else, but this minimal API declares that out of bounds. This allows third-party competitor libraries to jump in and support these advanced use cases while being backward compatible with the minimal API; you can drop in a third-party replacement, everything still works, and then you can start using the third-party extended functionality right away. This is an API designed to help users abandon it.
So then what is this API even for? What counts as a "minimal" use case? I think that your contributions to this discussion so far have been very thoughtful and productive insights about CLI parsing behavior, but it's also been largely the non-minimal non-obvious behavior that is, well, worth discussing. The fact that no one has really objected to the minimal functionality originally proposed here is a clue. I think that supporting parsing integers into i32
struct fields with a matching name is a fairly obvious and unobjectionable feature, which is probably why there's been no discussion of it, and those kinds of obvious features are the minimal design.
Parsing into []const T
is for sure a useful feature; i don't think there's really any disagreement on that; the question for this discussion thread is whether it's a minimal feature. Sub commands are also useful, but are not minimal. Parsing enums is minimal; parsing unions is not; etc.