Skip to content

How about something other than .properties syntax? #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
munificent opened this issue Feb 19, 2015 · 50 comments
Closed

How about something other than .properties syntax? #1

munificent opened this issue Feb 19, 2015 · 50 comments

Comments

@munificent
Copy link

I'm not a big fan of the ".properties" syntax. I don't think anyone outside of the Java ecosystem is even aware of it, and, as far as I can tell, it's official specification is just a doc comment in the Java API.

Most egregiously, properties files are specified to require ISO 8859-1 character encoding. That's gross. Everyone these days prefers UTF-8.

How about we just use JSON? That's the default config file format for the web these days.

If that's a pain for the VM to parse, another option would be to say that the file must contain a Dart map literal with string literal keys and values. That would presumably let the VM use the existing lexer and parser on it.

Thoughts?

@gbracha
Copy link

gbracha commented Feb 19, 2015

I like the idea of using a Dart map literal.

@anders-sandholm
Copy link

Just to reflect the further discussion over email on this:
We already use YAML for pubspec so Lars talked about using a well-defined subset of YAML that is sufficient for the package spec and that is easy to parse for the VM.

@gbracha
Copy link

gbracha commented Feb 19, 2015

YAYAML

@skybrian
Copy link

I would like to generate and parse this file in Go easily. JSON works for me. YAML is not well-specified or simple.

The current property format also works in principle, but I agree with Bob that we shouldn't reference the Java property file spec which is a bit weird. I think it would be okay to define the parsing rules right in the spec if they're simple enough.

Here are my recommendations for a generic line-based format:

  • UTF8
  • easy to strip comments: trim leading and trailing space, then remove lines that start with "#" or are blank.
  • records should be parsable using split() and simple unescaping. For example, if "=" is a field separator and can also appear in a field and needs to be escaped, use something like "\e" and not "=".

The idea is to make it trivial to write a correct parser in any scripting language with split() and replaceAll() string functions, so people aren't tempted to do anything else.

@munificent
Copy link
Author

I would like to generate and parse this file in Go easily.

https://github.com/go-yaml/yaml

While YAML may not be easy to parse, someone has already done the work in practically every language out there. Not that I'm personally attached to YAML.

@austincummings
Copy link

I think the syntax for the pkgspec.properties file is fine. I do think the name is too long though. It looks much better just as pkgspec.

Or maybe dump the information in the pubspec.lock file? Is that possible?

@munificent
Copy link
Author

Or maybe dump the information in the pubspec.lock file? Is that possible?

Unfortunately no, though it's a good idea. The problem is that the pkgspec file will contain paths that point to weird user-specific directories, like your pub cache in your home directory. You definitely don't want to check that into source control.

However, you do probably want to check your lockfile in, since it ensures everyone using your package uses the same versions of its dependencies.

Hence, two files.

@skybrian
Copy link

skybrian commented Mar 6, 2015

Regarding YAML, even though many libraries exist, it's unclear how compatible they are. For the package you linked to:

"The yaml package supports most of YAML 1.1 and 1.2, including support for anchors, tags, map merging, etc. Multi-document unmarshalling is not yet implemented, and base-60 floats from YAML 1.1 are purposefully not supported since they're a poor design and are gone in YAML 1.2."

Sounds pretty complicated to me. There is value in using a format that's so simple you can keep it all in your head and implement it yourself if needed.

@zoechi
Copy link

zoechi commented Mar 7, 2015

From what I have seen only a map from string to string is used in this file, right? I would expect this Go package to support this basic use case. I guess this limited subset would be even easy to parse with custom code but it would also still be a yaml file.
I don't care much about yaml but there are already pubspeck.yaml and pubspec.lock using this format.
I would name it packages.yaml

@ghost
Copy link

ghost commented Mar 8, 2015

If the file defines rules for for path resolution, it would be logical to follow convention and use term "path" somewhere in its name. E.g. simply "path.yaml" or "dartpath.yaml" or something. (obvious analogies are PATH, CLASSPATH, LD_LIBRARY_PATH etc).

@sethladd
Copy link
Contributor

If our file format is not actually YAML, but is some "well-defined subset of YAML", then let's not name the file extension .yaml.

foo.yaml implies the file can be any yaml, which might include comments or other yaml bits.

@lrhn
Copy link
Owner

lrhn commented Mar 19, 2015

I'll just suggest an even simpler format which is a subset of JAVA properties files, Windows ini-files and the gazillion different key/value files used in Unix (properties files are not just Java, it's just Java's name for its variant of a very common family of formats).

Syntax:

  • ASCII content only.
  • One entry per line, formatted as: key=value
  • If = or \ occurs in the key, they are escaped by prefixing with \. That hardly ever happens in practice since the keys are package names which cannot contain \, and usually don't contain =.
  • if \ occurs in the value, it is escaped by prefixing with \. That never happens since values are URI references.
  • Empty lines are ignored. Any other line not containing an = is an error.
  • Line terminators are CR or LF or CR+LF (where CR+LF is trivially implemented by using CR as line terminator and LF as an ignored empty line).

No whitespace handling, no comments. No file extension (name is just pkgspec), or if that's too much, call it pkgspec.txt.

Semantics:
Key is everything up to first unescaped =, with escapes resolved. It must be a URI path segment that is not . or ... The path segment MUST be percent-encoding and case normalized.
Value is everything after first unescaped =, with escapes resolved. It must be a URI reference with no query or fragment identifier, and it should end with a /. If it doesn't end with a /, one will be added automatically. The URI reference SHOULD be scheme, percent-encoding and case normalized (but non-normalized URI references will also be accepted). If the URI reference is relative, it's resolved wrt. the location of the pkgspec file.

That's simple (as simple as possible at least, we are dealing with URIs) and it gets the job done. Any further syntax isn't needed to solve the current problem, but can be argued to be in preparation for maybe extending the use of the file in the future, which is purely speculative, and probably YAGNI.

@skybrian
Copy link

It can be made simpler. For the value, if we disallow newlines and trailing whitespace then it doesn't need any escape sequences. Any space characters in a URL can be URL-escaped anyway.

I generally do like to support line comments and trailing whitespace removal since it's trivial to do. Requiring UTF-8 is generally a good idea for modern text formats.

Anyway, these are details. Should we agree that it's okay to invent a simple text format then it's not hard to do.

@ghost
Copy link

ghost commented Mar 19, 2015

I think the ban on comments is too harsh indeed.

Am I the only one who doesn't like the name pkgspec? Name somehow suggests that we can find package name, author etc there, but in fact there's another file (pubspec.yaml) for that.

I googled "package spec" - and indeed, it's used with different meaning elsewhere (e.g PL/SQL). Which, of course, could be ignored as coincidence, but... ours is not a package spec at all :)

It can be further argued that the choice of name is made (quoting Lasse) "in preparation for maybe extending the use of the file in the future, which is purely speculative, and probably YAGNI". Hehe.

@zoechi
Copy link

zoechi commented Mar 19, 2015

👍 I don't like it at all. I still like packages better. What would be a descriptive name, even when it's to long, just to understand better what it's actual purpose is. PackageRedirect?

@ghost
Copy link

ghost commented Mar 19, 2015

I think the term "path" should somehow appear as part of the name. Maybe it's only me though :)

@zoechi
Copy link

zoechi commented Mar 19, 2015

I don't miss path. For me path is more an implementation detail, but IMHO still better than spec ;-)

@ghost
Copy link

ghost commented Mar 19, 2015

But the whole issue is about implementation detail!
Sure, no one wants to be called just a path, it's understandable. But if we start calling everybody Package Spec, this would devalue the title in my opinion.
:-)

@lrhn
Copy link
Owner

lrhn commented Mar 20, 2015

I have no strong preference for the name, except that it should contain "package" or "pkg" and shouldn't be too long. Using "packages" is fine with me.

@munificent
Copy link
Author

Am I the only one who doesn't like the name pkgspec?

That's what #7 is for. Let's keep this bug focused on the syntax.

@munificent
Copy link
Author

Should we agree that it's okay to invent a simple text format then it's not hard to do.

I really don't think we should. Already even with Lasse's minimal proposal we've ratholed on encoding (ASCII? UTF-8?), line endings, whitespace (stripped? not?), escaping, and comments.

I do think we should pick something extensible because I think there's a very high chance we'll want something like search paths eventually. Can we just say the file is:

  • A UTF-8 file. It is lexed according to Dart's lexical grammar, including comment handling (20.1 in the spec).
  • It contains a single Dart map literal (mapLiteral in the spec without any leading const or type arguments).
  • Whose keys and values are string literals (stringLiteral in the spec).

That's it. Fully specified including encoding, whitespace, and escaping. Comments are allowed and specified. The VM already has code to parse it. It's easy for tools to generate it (just use JSON, which is a subset of it). We have code to parse it (use the analyzer or dart2js parser).

Also, it's easily expandable: just allow other value types in the map.

Thoughts?

@ovangle
Copy link

ovangle commented Mar 20, 2015

YAML is consistent with other config files in the language and I don't think see any need to

add a full YAML parser in the VM,

or

define a subset of YAML which can be understood by the VM.

As far as I can understand, the only entries allowed in the file are

('#' comment '\n') | (maybe_quoted_key: maybe_quoted_value '\n')

that's all that ever needs to be parsed. A file which uses any YAML syntax other than that can just be rejected as invalid pkgspec.yaml. And the stuff that needs to be parsed is pretty much the same as the properties file syntax anyway (except that keys and values can be optionally quoted and ':' instead of '=').

@ovangle
Copy link

ovangle commented Mar 20, 2015

ps. Mr. Nystrom's idea gets my vote.

pps. I realised after some reflection that an informal yaml subset probably won't work, because JSON is valid YAML.

so the json file

{
  "key": "value"
}

is a valid yaml file with exactly the same content as the key value pairs, but would be rejected as invalid pkgspec.yaml.

ppps. Although you could fix that by having the file start with an optional leading '{' and an optional trailing '}', and the file is otherwise limited to the subset I mentioned.

pppps. No more postscripts.

@ghost
Copy link

ghost commented Mar 20, 2015

Having 2 files (pubspec and pkgspec) implementing completely different formats (one -yaml, another -dart literal) could make users think right hand doesn't know what left hand is doing here.
Properties file was (more or less) ok because it can be perceived as trivial, so it doesn't dissonate with yaml so much (though some purists would disagree). But with dart literal... not sure.

@munificent
Copy link
Author

Having 2 files (pubspec and pkgspec) implementing completely different formats (one -yaml, another -dart literal) could make users think right hand doesn't know what left hand is doing here.

It would be nice to be consistent, but the reality is that those files are consumed by different tools with different constraints. The pubspec is read by pub, which operates at a strictly higher level than the VM and dart2js. The pkgspec file is lower-level than that.

Honestly, a better solution would possibly have been to make pubspecs use Darty syntax (i.e. any kind of literal). But we can't go back in time and do that, so...

@ghost
Copy link

ghost commented Mar 20, 2015

You can't go back in time, but, at least in theory, you can add support for pubspec written in dart syntax, as an option. Two files may have different extensions.
Even if it's a mere option, and even if no one uses this option, the very existence of it restores the feel of consistency IMO. No?

@munificent
Copy link
Author

Even if it's a mere option, and even if no one uses this option, the very existence of it restores the feel of consistency IMO. No?

I don't think adding a third option while leaving the previous one increases consistency. :)

@ghost
Copy link

ghost commented Mar 20, 2015

If you declare old one deprecated, then it will show a tendency towards consistency. Consistency is always in a state of flux, it's a moving target, so the effort is commendable. No?

@lrhn
Copy link
Owner

lrhn commented Mar 22, 2015

I think allowing a Dart map literal with string literal keys and values is still too allowing. It should at least restrict the string literals to not have interpolations, and possibly also restrict the kinds of string literals accepted (adjacent string combination? multiline strings? raw strings?).
And then we might as well pick JSON.

@ghost
Copy link

ghost commented Mar 22, 2015

We already know that every declarative data format strives to acquire procedural features over time -until it becomes kind of full-blown Turing-complete programming language (and very lousy one at that).

May it be a good idea to specify configuration using a program to begin with?
E.g. (somewhat restricted) program in dart that runs in isolate and returns a map.
It can be as simple as
main()=> {
// dart literal
}

Even a simplest config file may require variables to avoid repetition. E.g.
homebrew=../../libs/homebrew/lib/
glorph=../../libs/glorph/lib
// more like this - too much repetition!

Naturally, it's can be programmed as:
var list=["homebrew", "glorph", "foo". "bar"];
return new Map.fromIterables(list. list.map((s)=>"../libs/$s/lib"));

/A.K: There's no such thing as "data". Every piece of data is in fact a program to be processed by some "interpreter". Said interpreter, in turn, produces another piece of data, which is a program for another interpreter. Etc. Where does this chain end? In the ultimate interpreter: our mind. :-)

@munificent
Copy link
Author

It should at least restrict the string literals to not have interpolations,

👍

and possibly also restrict the kinds of string literals accepted (adjacent string combination? multiline strings? raw strings?).

We could do singleLineString and prohibit interpolation. I'm agnostic about this. It just depends on what the VM folks want.

And then we might as well pick JSON.

That doesn't give us comments. 😞 But I certainly wouldn't veto JSON if it came down to that.

@munificent
Copy link
Author

every declarative data format strives to acquire procedural features over time -until it becomes kind of full-blown Turing-complete programming language (and very lousy one at that).

I've certainly seen that happen in many cases but, so far, pubspecs haven't gone in that direction.

I think it has a lot to do with how large the files are and how much time programmers spend hand-authoring them. Once you have lots of big manually maintained files, you start wanting all of the abstraction and reuse features a real language gives you.

This file will be pretty small and mostly auto-generated, so I don't think there will be much pressure to make it Turing-complete. However, making it an actual script that can do stuff like IO would be interesting. You could basically implement something like a class loader or Pythons metaimport hooks with it.

My guess is it's not worth the complexity of doing that, though. The analyzer would have to execute this script, and that adds a pretty big complexity tax.

@ghost
Copy link

ghost commented Mar 23, 2015

Yes, it is a complexity tax. Which could be justified and even turned to our advantage if it can eliminate or reduce OTHER taxes. Maybe something returned by this script can be used for conditional imports?

Look this way: we have pubspec, this pkgspec, conditional import - all are different kinds of config.
Maybe it's just ONE single config, generated by script? The thing is: conditions that affect "conditional import", which are supposed to be kind of static consts in proposals under discussion, are normal variables for this pre-script. It can use all power of dart:

if (Platform.isWindows && something || somethingElse) importConfig["dart.io"]=true;

And then, at least, there's no problem for analyzer to statically evaluate any expressions.
All the stuff returned by config script can be considered const for the purposes of analyzer. (We just return const map).

There can be other advantages, yet to be discovered.

@skybrian
Copy link

We can and will write tools to generate this file. Part of the reason we're doing it in the first place is to integrate with other build systems. A generator can be written in any language you like and can certainly do I/O.

The config file itself, however, is not a program; it's a program's output, part of the communications protocol between some other tool and the analyzer. It's a serialization of a map, perhaps with a few niceties for human readability if it happens to be hand-edited. Seems like JSON is best since it's easiest to agree on?

It might be interesting to specify a Dart script in pubspec.yaml to act as a pubspec generator, or perhaps do it in build.dart? But it can't rely on any imported Dart packages or there's a chicken-and-egg problem. Seems like we could defer this for now.

@ghost
Copy link

ghost commented Mar 23, 2015

It might be interesting to specify a Dart script in pubspec.yaml to act as a pubspec generator, or perhaps do it in build.dart?

Note that if you have a scipt (e.g. config.dart), you can still have YAML files or JSON files - script can read and massage them:

main() {
Map config=readYaml(filename);
// massage it
return config;
}

I would think of a script as an egg :-) It's debatable though.

@munificent
Copy link
Author

We talked about the syntax in the DEP meeting this week. Our feeling is that we really would like something that is a very restricted subset of Dart syntax. Doing that plays nicer with existing tools, and gives us room to grow over time if needed.

To try to make things easier on the VM folks, our idea is that we'll reuse the lexical grammar (i.e. the scanner) as-is and then specify the package spec using a very very rigid minimal grammar. If the VM folks can't easily reuse the parser for map literals, this would let them write a simple one-off parser just for this.

Here's an example of the grammar I have in mind:

const packages = const {
  r"args": r"/path/to/args",
  r"unittest": "r/path/to/unittest"
};

The full grammar is:

packageSpec:

    const packages = mapLiteral ;

    ;

mapLiteral:

    const { (mapLiteralEntry (, mapLiteralEntry)*)? }

    ;

mapLiteralEntry:

    rawString : rawString

    ;

rawString:

    r " (˜(" | NEWLINE ))* "

    ;

The packageSpec production is new. The rawString production is just the last clause in the spec's singleLineString rule. Note that no interpolation or concatenation is allowed for strings. The mapLiteral differs in that it:

  • Makes const required.
  • Removes typeArguments?.
  • Disallows a trailing comma.
  • Require map entries to be raw strings.

Finally, mapLiteralEntry requires the key and value to be rawString.

This looks very simple to me. At the same time, it answers all of our questions about encoding, escaping, comments, and whitespace. Further, our existing tools can parse and syntax highlight it today. It's even a valid Dart library, so a program could import it and use the packages map at runtime.

What do you think?

@lrhn
Copy link
Owner

lrhn commented Mar 28, 2015

It's a bold move, requiring a const declaration. It does mean that any Dart parser can parse the file, and you can even choose to import it from Dart if you want to.
Using raw strings should also mean that there is no need for escaping since double-quotes can't occur unescaped in URIs, nor can any kind of line terminator, so the keys/values are always proper substrings of the original source. Using a non-raw string would require special handling of backslash and dollar, where dollar can occur in a URI path.

The only drawback is that by requiring raw strings, the map isn't also a JSON literal, which would have made the format easier to parse for non-Dart tools.

I do think that a custom parser can be faster (but not sure how much) than just using the generic Dart tokenizer/parser, and if we are using a custom parser anyway, then using a known format isn't that much of an advantage - we are just inventing yet another key/value format.

I can live with this format, and if the VM can implement it efficiently, I guess it should be fine.

@munificent
Copy link
Author

the map isn't also a JSON literal

Well, the preceding const packages = const pretty much rules that out anyway.

I do think that a custom parser can be faster (but not sure how much) than just using the generic Dart tokenizer/parser

I don't think parsing speed for this should be a bottleneck. This file will certainly be smaller that most user programs, and the VM has to parse those before it can run.

I can live with this format, and if the VM can implement it efficiently, I guess it should be fine.

🎉

@skybrian
Copy link

I still need to read and write this file from Go. Given that most languages don't have a Dart parser available, and even in Dart, there is not yet a stable public API for invoking a Dart parser, making this a subset of Dart seems like a bad idea.

I don't think anyone has any serious objections to JSON?

@devoncarew
Copy link

+1 to using a format that is easily consumable from other languages. This will make consumption from other (non-dart) tools easier.

@lrhn
Copy link
Owner

lrhn commented Mar 28, 2015

Another thing with using Dart syntax is that you need to have the same white-space rules as Dart too. That might be more complex than what you need. At least it needs to be specified what the rules are.

@ghost
Copy link

ghost commented Mar 28, 2015

Instead of restricting map literal syntax, how about restricting JSON syntax by simple regexp - e.g. by defining

stringLiteral -> double-quote sequence-of-chars-with-no-backslash-or-newline double-quote
NameValuePair -> \s* stringLiteral \s* colon \s* stringLIteral
MapLiteral -> { NameValuePair [comma NameValuePair]* }
(or something similar).
It's easy to validate and parse to map using regexp only (the whole program may fit in 10 lines of code or less),
At least, the format is familiar, and using backslashes in URIs is not a good practice anyway.

@ghost
Copy link

ghost commented Mar 29, 2015

Actually, "honest" parsing of restricted JSON (no regexp) is quite trivial: 60 lines including all error processing.
https://gist.github.com/tatumizer/5132ebb39da77c5b301a

@munificent
Copy link
Author

I don't think anyone has any serious objections to JSON?

@iposva-google might.

@iposva-google
Copy link

This is not a question what the VM can or cannot parse, or for what it already has a partial implementation of a grammar.

We need the embedders of the VM to implement this and so it needs to be a super-simple file format. The VM does not do any of the loading directly, instead it is setup to be run in a sandboxed environment relying on the embedder to deal with the resolution of URIs and the loading of the sources.

In short: The proposal as it is currently written up is workable (commit 38378cb), single line with "=" separating key and value.

@munificent
Copy link
Author

We need the embedders of the VM to implement this and so it needs to be a super-simple file format.

  • Embedders of the VM do not have to support this. A custom embedder can handle "package:" URLs however it likes.
  • For custom embedders that do want to follow the same behavior as the standalone VM, are you saying it is impossible to reuse code to parse a file?

@lrhn
Copy link
Owner

lrhn commented Apr 16, 2015

Sticking with the simplest format for now. It's not Java properties - those are hugely complex in comparison. It's just single lines of identifier '=' rest or '#' comment.

@lrhn lrhn closed this as completed Apr 16, 2015
@munificent
Copy link
Author

How about removing # too? If we don't have a use case for comments, may as well keep it simpler.

@sethladd
Copy link
Contributor

Putting a note for when this file was generated could be helpful. A comment
could be used for that kind of note. Also a note for what program generated
that file and that it is auto generated could be useful.
On Fri, Apr 17, 2015 at 14:10 Bob Nystrom [email protected] wrote:

How about removing # too? If we don't have a use case for comments, may
as well keep it simpler.


Reply to this email directly or view it on GitHub
#1 (comment).

@munificent
Copy link
Author

SGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests