Skip to content

std: minimal CLI parsing driven by struct fields #24601

@thejoshwolfe

Description

@thejoshwolfe

Inspired by #24510 , this proposal aims to be the minimal CLI implementation mentioned there. Forking off of that issue for more isolated discussion, this stdlib CLI parser should have a generally useful API regardless of its possible inclusion in "juicy main".

This CLI parser proposal is:

  • minimal
  • opinionated
  • appeals to the general intuition of the Python argparse family (rather than Go flag or Bash getopt)

The configuration is a struct definition:

const Args = struct {
    named: struct {
        // named parameter config goes here.
    },
    positional: []const []const u8,
};

The front-door API looks like this:

// std.cli

pub fn parse(comptime Args: type, allocator: Allocator, iter: std.process.ArgIterator, options: Options) Error!Args {
    // ...
}

/// argv supports [][:0]u8, []const []const u8, etc.
pub fn parseSlice(comptime Args: type, allocator: Allocator, argv: anytype, options: Options) Error!Args {
    // ...
}

pub const Options = struct {
    /// Recognize --help, which will return error.Help.
    help: bool = true,
    /// error.InvalidArgument and error.Help will also print usage information to stderr.
    print_errors: bool = true,
};
pub const Error = error{
    /// Includes unrecognized option names and values that cannot be parsed into the desired value.
    InvalidArgument,
    /// The --help argument was given.
    Help,
} || Allocator.Error;

For each field of Args.named, the field name determines the --name recognized during parsing, and the field type and possible default value determine parsing behavior.

  • Field names are prefixed by --, never single - and never /. This is true even if you name a field with a single letter, e.g. --n=100. Motivation: sometimes single - means that multiple single-letter options can be grouped together, like ls -lA, but double -- never has this ambiguity; although /-prefixed names are common on Windows, --prefixed names are also common, and CLI users will just need to deal with it.
  • Field names are used verbatim; no translation between - and _ (and no Unicode normalization). You can use @"field-name" if you really want those hyphens in there. Motivation: lower complexity with respect to name mangling; there is no name mangling.
  • Any field that does not have a default value is required to be supplied on the command line. Alternate proposal: all struct fields must have a default value. Motivation for supporting required options: it's useful for e.g. --output options, and it semantically matches struct initialization in Zig.
  • Providing the same (scalar) argument multiple times overrides previous values with later values. Alternate proposal: the same (scalar) argument multiple times is an error. Motivation for override behavior: it's useful and matches the behavior that people are used to, e.g. your git alias might include --color=auto then you can additionally give --color=never to override it.
  • A lone double hyphen -- stops recognizing hyphen-prefixed arguments for the rest of the args array. Before that, any hyphen-prefixed argument must be a recognized option name, if even if it's single-hyphen prefixed, which will never match anything.
  • A parameterless option --help is generated (unless options.help is false) that prints the help (unless options.print_errors is false) and returns an error. (Note that it is not possible to access the /// doc comments at compile time, but that would be a cool way of giving documentation on options, wouldn't it.) Any usage error will result in an error printed that concisely describes the error and a prompt to use --help for more info; usage errors will not print the full usage. Alternate proposal: also include -h as an alias, like in Python's argparse. Motivation for excluding -h: this system doesn't do any single-hyphen short aliases. The result of giving -h would be something like -h not reocngized. try --help, which still indirectly gives the user what they were looking for. Alternate proposal: remove the options and have --help print unconditionally.
  • Either a space or an equal sign can separate the name and the value, e.g. --name value or --name=value. (Any literal = in a field name, e.g. @"conf=usion" would cause a compile error.) Motivation for space separation: shell tab-completion works best on space-separated tokens, e.g. for file paths. Motivation for equals-separated: when constructing an args array to launch a child process, a single append call possibly including string concatenation is simpler than two append calls or an extend call with two items. Remember that this is a consideration for all programming languages that might call a Zig executable, not just relevant from within Zig code. Additional motivation for equals: the relationship between names and values is self-documenting, e.g. --a=b c d is more self-documenting than --a b c d.
  • Parsing integers would be done with std.fmt.parseInt() with base 0 to support 0x prefixes and such. Parsing floats would be done with std.fmt.parseFloat(), which means hex floats, NaN, -Inf, etc. would be supported. (And parsing strings would also be trivially supported.)

Departing from the "minimal" zone, but features I think are still important:

  • If a field type is bool, then parameterless --name and --no-name are generated to turn the option on and off respectively. All field names are forbidden to start with @"no-" to avoid possible collisions with generated names. Alternate proposal: --name=true and --name=false. Motivation for --name and --no-name: despite higher complexity, parameterless boolean options are more familiar, e.g. --verbose and --force for git push or mv. Additionally, it's common to want a boolean parameter in a CLI to include the word no, e.g. --no-clobber, and a negatively-named boolean is unnecessarily confusing; the code would read "if not no clobber".
  • Parsing enums would match the name of the enum, never the integer value. Motivation: simple.
  • Parsing slices []const T (other than []const u8) would mean that multiple of the same option appends to the slice, e.g. --exclude=".git" --exclude=".DS_Store". A field type []const bool would be a compile error, because it's not clear how that should be supported. Motivation for array options: it's a common feature, e.g. grep -e, gcc -I, kubectl get -l. Motivation against array options: it's significantly more complex to implement than scalar values.
  • Parsing optional types ?T would not be supported, always a compile error. It could be useful to initialize optional fields to null to track whether they ever get provided on the command line, but I believe this is a misfeature. I believe it should always be possible to override previous values to restore an option to its default value, even if that means designating special values like -1 or "" for this purpose. Consider git --color=auto can explicitly override --color=always, which is more useful than auto being the behavior only when --color is never specified.

UPDATE

thanks for the discussions everyone! here are some responses to your contributions:

  • Sub commands: there are many ways to do sub commands / sub parsers, and it's definitely out of scope for a minimal API. Compared to all the other features in the above proposal, sub commands are profoundly more complex (nesting the API within itself is a fairly obvious approach, which is definitely not minimal, and then it leads to all sorts of questions like whether parent arguments can be provided from within a sub command, etc.). We want people to have sub command CLI parsing, but it doesn't belong in the minimal API. (foreshadowing)
  • --help docs: every proposal for giving textual help has been very reasonable, and thanks yall for the suggestions. So far, I don't think any of the ideas qualify as minimal, and it seems like the idea of having struct-field-driven configuration simply can't have help docs for options in the most obvious, minimal, way, which is using the /// doc comments for the fields; it's not allowed by the Zig type system (intentionally to discourage DSL's in doc comments for meta programming). how to give a --help string still seems unresolved. let's keep discussing it.
  • Build.option: this minimal CLI parser is incompatible with Build.option because of the description_raw parameter not being supported here. Although having multiple ways to do the same-ish thing is frustrating, I don't see a path for unifying the two systems. This concern is still unresolved.
  • user-defined parseCLI extensibility: definitely not minimal, but a cool idea. (second foreshadowing)

There's an important idea I didn't include above that's worth articulating: This minimal API gives users a path to migrate to a more advanced API. This API has intentional limitations that are compile errors. For example, declaring a named argument with a struct type is not allowed; that could suggest that you want a sub command or a custom parser or something else, but this minimal API declares that out of bounds. This allows third-party competitor libraries to jump in and support these advanced use cases while being backward compatible with the minimal API; you can drop in a third-party replacement, everything still works, and then you can start using the third-party extended functionality right away. This is an API designed to help users abandon it.

So then what is this API even for? What counts as a "minimal" use case? I think that your contributions to this discussion so far have been very thoughtful and productive insights about CLI parsing behavior, but it's also been largely the non-minimal non-obvious behavior that is, well, worth discussing. The fact that no one has really objected to the minimal functionality originally proposed here is a clue. I think that supporting parsing integers into i32 struct fields with a matching name is a fairly obvious and unobjectionable feature, which is probably why there's been no discussion of it, and those kinds of obvious features are the minimal design.

Parsing into []const T is for sure a useful feature; i don't think there's really any disagreement on that; the question for this discussion thread is whether it's a minimal feature. Sub commands are also useful, but are not minimal. Parsing enums is minimal; parsing unions is not; etc.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementSolving this issue will likely involve adding new logic or components to the codebase.standard libraryThis issue involves writing Zig code for the standard library.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions