How about something other than .properties syntax? #1

munificent · 2015-02-19T00:25:57Z

I'm not a big fan of the ".properties" syntax. I don't think anyone outside of the Java ecosystem is even aware of it, and, as far as I can tell, it's official specification is just a doc comment in the Java API.

Most egregiously, properties files are specified to require ISO 8859-1 character encoding. That's gross. Everyone these days prefers UTF-8.

How about we just use JSON? That's the default config file format for the web these days.

If that's a pain for the VM to parse, another option would be to say that the file must contain a Dart map literal with string literal keys and values. That would presumably let the VM use the existing lexer and parser on it.

Thoughts?

gbracha · 2015-02-19T21:41:52Z

I like the idea of using a Dart map literal.

anders-sandholm · 2015-02-19T22:35:30Z

Just to reflect the further discussion over email on this:
We already use YAML for pubspec so Lars talked about using a well-defined subset of YAML that is sufficient for the package spec and that is easy to parse for the VM.

gbracha · 2015-02-19T22:52:56Z

YAYAML

skybrian · 2015-02-25T22:14:52Z

I would like to generate and parse this file in Go easily. JSON works for me. YAML is not well-specified or simple.

The current property format also works in principle, but I agree with Bob that we shouldn't reference the Java property file spec which is a bit weird. I think it would be okay to define the parsing rules right in the spec if they're simple enough.

Here are my recommendations for a generic line-based format:

UTF8
easy to strip comments: trim leading and trailing space, then remove lines that start with "#" or are blank.
records should be parsable using split() and simple unescaping. For example, if "=" is a field separator and can also appear in a field and needs to be escaped, use something like "\e" and not "=".

The idea is to make it trivial to write a correct parser in any scripting language with split() and replaceAll() string functions, so people aren't tempted to do anything else.

munificent · 2015-02-26T00:07:05Z

I would like to generate and parse this file in Go easily.

https://github.com/go-yaml/yaml

While YAML may not be easy to parse, someone has already done the work in practically every language out there. Not that I'm personally attached to YAML.

austincummings · 2015-03-05T01:13:57Z

I think the syntax for the pkgspec.properties file is fine. I do think the name is too long though. It looks much better just as pkgspec.

Or maybe dump the information in the pubspec.lock file? Is that possible?

munificent · 2015-03-05T01:22:43Z

Or maybe dump the information in the pubspec.lock file? Is that possible?

Unfortunately no, though it's a good idea. The problem is that the pkgspec file will contain paths that point to weird user-specific directories, like your pub cache in your home directory. You definitely don't want to check that into source control.

However, you do probably want to check your lockfile in, since it ensures everyone using your package uses the same versions of its dependencies.

Hence, two files.

skybrian · 2015-03-06T16:43:29Z

Regarding YAML, even though many libraries exist, it's unclear how compatible they are. For the package you linked to:

"The yaml package supports most of YAML 1.1 and 1.2, including support for anchors, tags, map merging, etc. Multi-document unmarshalling is not yet implemented, and base-60 floats from YAML 1.1 are purposefully not supported since they're a poor design and are gone in YAML 1.2."

Sounds pretty complicated to me. There is value in using a format that's so simple you can keep it all in your head and implement it yourself if needed.

zoechi · 2015-03-07T11:47:56Z

From what I have seen only a map from string to string is used in this file, right? I would expect this Go package to support this basic use case. I guess this limited subset would be even easy to parse with custom code but it would also still be a yaml file.
I don't care much about yaml but there are already pubspeck.yaml and pubspec.lock using this format.
I would name it packages.yaml

ghost · 2015-03-08T21:02:12Z

If the file defines rules for for path resolution, it would be logical to follow convention and use term "path" somewhere in its name. E.g. simply "path.yaml" or "dartpath.yaml" or something. (obvious analogies are PATH, CLASSPATH, LD_LIBRARY_PATH etc).

sethladd · 2015-03-19T03:22:22Z

If our file format is not actually YAML, but is some "well-defined subset of YAML", then let's not name the file extension .yaml.

foo.yaml implies the file can be any yaml, which might include comments or other yaml bits.

lrhn · 2015-03-19T07:08:45Z

I'll just suggest an even simpler format which is a subset of JAVA properties files, Windows ini-files and the gazillion different key/value files used in Unix (properties files are not just Java, it's just Java's name for its variant of a very common family of formats).

Syntax:

ASCII content only.
One entry per line, formatted as: key=value
If = or \ occurs in the key, they are escaped by prefixing with \. That hardly ever happens in practice since the keys are package names which cannot contain \, and usually don't contain =.
if \ occurs in the value, it is escaped by prefixing with \. That never happens since values are URI references.
Empty lines are ignored. Any other line not containing an = is an error.
Line terminators are CR or LF or CR+LF (where CR+LF is trivially implemented by using CR as line terminator and LF as an ignored empty line).

No whitespace handling, no comments. No file extension (name is just pkgspec), or if that's too much, call it pkgspec.txt.

Semantics:
Key is everything up to first unescaped =, with escapes resolved. It must be a URI path segment that is not . or ... The path segment MUST be percent-encoding and case normalized.
Value is everything after first unescaped =, with escapes resolved. It must be a URI reference with no query or fragment identifier, and it should end with a /. If it doesn't end with a /, one will be added automatically. The URI reference SHOULD be scheme, percent-encoding and case normalized (but non-normalized URI references will also be accepted). If the URI reference is relative, it's resolved wrt. the location of the pkgspec file.

That's simple (as simple as possible at least, we are dealing with URIs) and it gets the job done. Any further syntax isn't needed to solve the current problem, but can be argued to be in preparation for maybe extending the use of the file in the future, which is purely speculative, and probably YAGNI.

skybrian · 2015-03-19T15:25:14Z

It can be made simpler. For the value, if we disallow newlines and trailing whitespace then it doesn't need any escape sequences. Any space characters in a URL can be URL-escaped anyway.

I generally do like to support line comments and trailing whitespace removal since it's trivial to do. Requiring UTF-8 is generally a good idea for modern text formats.

Anyway, these are details. Should we agree that it's okay to invent a simple text format then it's not hard to do.

ghost · 2015-03-19T16:02:16Z

I think the ban on comments is too harsh indeed.

Am I the only one who doesn't like the name pkgspec? Name somehow suggests that we can find package name, author etc there, but in fact there's another file (pubspec.yaml) for that.

I googled "package spec" - and indeed, it's used with different meaning elsewhere (e.g PL/SQL). Which, of course, could be ignored as coincidence, but... ours is not a package spec at all :)

It can be further argued that the choice of name is made (quoting Lasse) "in preparation for maybe extending the use of the file in the future, which is purely speculative, and probably YAGNI". Hehe.

zoechi · 2015-03-19T16:19:03Z

👍 I don't like it at all. I still like packages better. What would be a descriptive name, even when it's to long, just to understand better what it's actual purpose is. PackageRedirect?

ghost · 2015-03-19T16:36:35Z

I think the term "path" should somehow appear as part of the name. Maybe it's only me though :)

zoechi · 2015-03-19T17:23:54Z

I don't miss path. For me path is more an implementation detail, but IMHO still better than spec ;-)

ghost · 2015-03-19T19:43:03Z

But the whole issue is about implementation detail!
Sure, no one wants to be called just a path, it's understandable. But if we start calling everybody Package Spec, this would devalue the title in my opinion.
:-)

lrhn · 2015-03-20T08:11:38Z

I have no strong preference for the name, except that it should contain "package" or "pkg" and shouldn't be too long. Using "packages" is fine with me.

munificent · 2015-03-20T17:18:26Z

Am I the only one who doesn't like the name pkgspec?

That's what #7 is for. Let's keep this bug focused on the syntax.

munificent · 2015-03-20T17:26:02Z

Should we agree that it's okay to invent a simple text format then it's not hard to do.

I really don't think we should. Already even with Lasse's minimal proposal we've ratholed on encoding (ASCII? UTF-8?), line endings, whitespace (stripped? not?), escaping, and comments.

I do think we should pick something extensible because I think there's a very high chance we'll want something like search paths eventually. Can we just say the file is:

A UTF-8 file. It is lexed according to Dart's lexical grammar, including comment handling (20.1 in the spec).
It contains a single Dart map literal (mapLiteral in the spec without any leading const or type arguments).
Whose keys and values are string literals (stringLiteral in the spec).

That's it. Fully specified including encoding, whitespace, and escaping. Comments are allowed and specified. The VM already has code to parse it. It's easy for tools to generate it (just use JSON, which is a subset of it). We have code to parse it (use the analyzer or dart2js parser).

Also, it's easily expandable: just allow other value types in the map.

Thoughts?

ovangle · 2015-03-20T17:28:31Z

YAML is consistent with other config files in the language and I don't think see any need to

add a full YAML parser in the VM,

or

define a subset of YAML which can be understood by the VM.

As far as I can understand, the only entries allowed in the file are

('#' comment '\n') | (maybe_quoted_key: maybe_quoted_value '\n')

that's all that ever needs to be parsed. A file which uses any YAML syntax other than that can just be rejected as invalid pkgspec.yaml. And the stuff that needs to be parsed is pretty much the same as the properties file syntax anyway (except that keys and values can be optionally quoted and ':' instead of '=').

ovangle · 2015-03-20T17:37:24Z

ps. Mr. Nystrom's idea gets my vote.

pps. I realised after some reflection that an informal yaml subset probably won't work, because JSON is valid YAML.

so the json file

{
  "key": "value"
}

is a valid yaml file with exactly the same content as the key value pairs, but would be rejected as invalid pkgspec.yaml.

ppps. Although you could fix that by having the file start with an optional leading '{' and an optional trailing '}', and the file is otherwise limited to the subset I mentioned.

pppps. No more postscripts.

ghost · 2015-03-20T17:39:21Z

Having 2 files (pubspec and pkgspec) implementing completely different formats (one -yaml, another -dart literal) could make users think right hand doesn't know what left hand is doing here.
Properties file was (more or less) ok because it can be perceived as trivial, so it doesn't dissonate with yaml so much (though some purists would disagree). But with dart literal... not sure.

munificent · 2015-03-20T18:01:35Z

Having 2 files (pubspec and pkgspec) implementing completely different formats (one -yaml, another -dart literal) could make users think right hand doesn't know what left hand is doing here.

It would be nice to be consistent, but the reality is that those files are consumed by different tools with different constraints. The pubspec is read by pub, which operates at a strictly higher level than the VM and dart2js. The pkgspec file is lower-level than that.

Honestly, a better solution would possibly have been to make pubspecs use Darty syntax (i.e. any kind of literal). But we can't go back in time and do that, so...

ghost · 2015-03-20T19:13:08Z

You can't go back in time, but, at least in theory, you can add support for pubspec written in dart syntax, as an option. Two files may have different extensions.
Even if it's a mere option, and even if no one uses this option, the very existence of it restores the feel of consistency IMO. No?

munificent · 2015-03-20T20:13:51Z

Even if it's a mere option, and even if no one uses this option, the very existence of it restores the feel of consistency IMO. No?

I don't think adding a third option while leaving the previous one increases consistency. :)

ghost · 2015-03-20T20:24:58Z

If you declare old one deprecated, then it will show a tendency towards consistency. Consistency is always in a state of flux, it's a moving target, so the effort is commendable. No?

lrhn · 2015-03-22T17:42:15Z

I think allowing a Dart map literal with string literal keys and values is still too allowing. It should at least restrict the string literals to not have interpolations, and possibly also restrict the kinds of string literals accepted (adjacent string combination? multiline strings? raw strings?).
And then we might as well pick JSON.

ghost · 2015-03-22T20:59:42Z

We already know that every declarative data format strives to acquire procedural features over time -until it becomes kind of full-blown Turing-complete programming language (and very lousy one at that).

May it be a good idea to specify configuration using a program to begin with?
E.g. (somewhat restricted) program in dart that runs in isolate and returns a map.
It can be as simple as
main()=> {
// dart literal
}

Even a simplest config file may require variables to avoid repetition. E.g.
homebrew=../../libs/homebrew/lib/
glorph=../../libs/glorph/lib
// more like this - too much repetition!

Naturally, it's can be programmed as:
var list=["homebrew", "glorph", "foo". "bar"];
return new Map.fromIterables(list. list.map((s)=>"../libs/$s/lib"));

/A.K: There's no such thing as "data". Every piece of data is in fact a program to be processed by some "interpreter". Said interpreter, in turn, produces another piece of data, which is a program for another interpreter. Etc. Where does this chain end? In the ultimate interpreter: our mind. :-)

munificent · 2015-03-23T19:48:25Z

It should at least restrict the string literals to not have interpolations,

👍

and possibly also restrict the kinds of string literals accepted (adjacent string combination? multiline strings? raw strings?).

We could do singleLineString and prohibit interpolation. I'm agnostic about this. It just depends on what the VM folks want.

And then we might as well pick JSON.

That doesn't give us comments. 😞 But I certainly wouldn't veto JSON if it came down to that.

munificent · 2015-03-23T19:51:08Z

every declarative data format strives to acquire procedural features over time -until it becomes kind of full-blown Turing-complete programming language (and very lousy one at that).

I've certainly seen that happen in many cases but, so far, pubspecs haven't gone in that direction.

I think it has a lot to do with how large the files are and how much time programmers spend hand-authoring them. Once you have lots of big manually maintained files, you start wanting all of the abstraction and reuse features a real language gives you.

This file will be pretty small and mostly auto-generated, so I don't think there will be much pressure to make it Turing-complete. However, making it an actual script that can do stuff like IO would be interesting. You could basically implement something like a class loader or Pythons metaimport hooks with it.

My guess is it's not worth the complexity of doing that, though. The analyzer would have to execute this script, and that adds a pretty big complexity tax.

ghost · 2015-03-23T20:18:21Z

Yes, it is a complexity tax. Which could be justified and even turned to our advantage if it can eliminate or reduce OTHER taxes. Maybe something returned by this script can be used for conditional imports?

Look this way: we have pubspec, this pkgspec, conditional import - all are different kinds of config.
Maybe it's just ONE single config, generated by script? The thing is: conditions that affect "conditional import", which are supposed to be kind of static consts in proposals under discussion, are normal variables for this pre-script. It can use all power of dart:

if (Platform.isWindows && something || somethingElse) importConfig["dart.io"]=true;

And then, at least, there's no problem for analyzer to statically evaluate any expressions.
All the stuff returned by config script can be considered const for the purposes of analyzer. (We just return const map).

There can be other advantages, yet to be discovered.

skybrian · 2015-03-23T22:08:37Z

We can and will write tools to generate this file. Part of the reason we're doing it in the first place is to integrate with other build systems. A generator can be written in any language you like and can certainly do I/O.

The config file itself, however, is not a program; it's a program's output, part of the communications protocol between some other tool and the analyzer. It's a serialization of a map, perhaps with a few niceties for human readability if it happens to be hand-edited. Seems like JSON is best since it's easiest to agree on?

It might be interesting to specify a Dart script in pubspec.yaml to act as a pubspec generator, or perhaps do it in build.dart? But it can't rely on any imported Dart packages or there's a chicken-and-egg problem. Seems like we could defer this for now.

ghost · 2015-03-23T22:54:12Z

It might be interesting to specify a Dart script in pubspec.yaml to act as a pubspec generator, or perhaps do it in build.dart?

Note that if you have a scipt (e.g. config.dart), you can still have YAML files or JSON files - script can read and massage them:

main() {
Map config=readYaml(filename);
// massage it
return config;
}

I would think of a script as an egg :-) It's debatable though.

munificent · 2015-03-27T23:28:29Z

We talked about the syntax in the DEP meeting this week. Our feeling is that we really would like something that is a very restricted subset of Dart syntax. Doing that plays nicer with existing tools, and gives us room to grow over time if needed.

To try to make things easier on the VM folks, our idea is that we'll reuse the lexical grammar (i.e. the scanner) as-is and then specify the package spec using a very very rigid minimal grammar. If the VM folks can't easily reuse the parser for map literals, this would let them write a simple one-off parser just for this.

Here's an example of the grammar I have in mind:

const packages = const {
  r"args": r"/path/to/args",
  r"unittest": "r/path/to/unittest"
};

The full grammar is:

packageSpec:

const packages = mapLiteral ;

;

mapLiteral:

const { (mapLiteralEntry (, mapLiteralEntry)*)? }

;

mapLiteralEntry:

rawString : rawString

;

rawString:

r " (˜(" | NEWLINE ))* "

;

The packageSpec production is new. The rawString production is just the last clause in the spec's singleLineString rule. Note that no interpolation or concatenation is allowed for strings. The mapLiteral differs in that it:

Makes const required.
Removes typeArguments?.
Disallows a trailing comma.
Require map entries to be raw strings.

Finally, mapLiteralEntry requires the key and value to be rawString.

This looks very simple to me. At the same time, it answers all of our questions about encoding, escaping, comments, and whitespace. Further, our existing tools can parse and syntax highlight it today. It's even a valid Dart library, so a program could import it and use the packages map at runtime.

What do you think?

lrhn · 2015-03-28T00:07:05Z

It's a bold move, requiring a const declaration. It does mean that any Dart parser can parse the file, and you can even choose to import it from Dart if you want to.
Using raw strings should also mean that there is no need for escaping since double-quotes can't occur unescaped in URIs, nor can any kind of line terminator, so the keys/values are always proper substrings of the original source. Using a non-raw string would require special handling of backslash and dollar, where dollar can occur in a URI path.

The only drawback is that by requiring raw strings, the map isn't also a JSON literal, which would have made the format easier to parse for non-Dart tools.

I do think that a custom parser can be faster (but not sure how much) than just using the generic Dart tokenizer/parser, and if we are using a custom parser anyway, then using a known format isn't that much of an advantage - we are just inventing yet another key/value format.

I can live with this format, and if the VM can implement it efficiently, I guess it should be fine.

munificent · 2015-03-28T00:12:55Z

the map isn't also a JSON literal

Well, the preceding const packages = const pretty much rules that out anyway.

I do think that a custom parser can be faster (but not sure how much) than just using the generic Dart tokenizer/parser

I don't think parsing speed for this should be a bottleneck. This file will certainly be smaller that most user programs, and the VM has to parse those before it can run.

I can live with this format, and if the VM can implement it efficiently, I guess it should be fine.

🎉

skybrian · 2015-03-28T00:58:14Z

I still need to read and write this file from Go. Given that most languages don't have a Dart parser available, and even in Dart, there is not yet a stable public API for invoking a Dart parser, making this a subset of Dart seems like a bad idea.

I don't think anyone has any serious objections to JSON?

devoncarew · 2015-03-28T02:33:11Z

+1 to using a format that is easily consumable from other languages. This will make consumption from other (non-dart) tools easier.

lrhn · 2015-03-28T14:39:19Z

Another thing with using Dart syntax is that you need to have the same white-space rules as Dart too. That might be more complex than what you need. At least it needs to be specified what the rules are.

ghost · 2015-03-28T22:46:53Z

Instead of restricting map literal syntax, how about restricting JSON syntax by simple regexp - e.g. by defining

stringLiteral -> double-quote sequence-of-chars-with-no-backslash-or-newline double-quote
NameValuePair -> \s* stringLiteral \s* colon \s* stringLIteral
MapLiteral -> { NameValuePair [comma NameValuePair]* }
(or something similar).
It's easy to validate and parse to map using regexp only (the whole program may fit in 10 lines of code or less),
At least, the format is familiar, and using backslashes in URIs is not a good practice anyway.

ghost · 2015-03-29T21:54:23Z

Actually, "honest" parsing of restricted JSON (no regexp) is quite trivial: 60 lines including all error processing.
https://gist.github.com/tatumizer/5132ebb39da77c5b301a

munificent · 2015-03-30T16:52:33Z

I don't think anyone has any serious objections to JSON?

@iposva-google might.

iposva-google · 2015-03-31T05:46:06Z

This is not a question what the VM can or cannot parse, or for what it already has a partial implementation of a grammar.

We need the embedders of the VM to implement this and so it needs to be a super-simple file format. The VM does not do any of the loading directly, instead it is setup to be run in a sandboxed environment relying on the embedder to deal with the resolution of URIs and the loading of the sources.

In short: The proposal as it is currently written up is workable (commit 38378cb), single line with "=" separating key and value.

munificent · 2015-03-31T16:45:20Z

We need the embedders of the VM to implement this and so it needs to be a super-simple file format.

Embedders of the VM do not have to support this. A custom embedder can handle "package:" URLs however it likes.
For custom embedders that do want to follow the same behavior as the standalone VM, are you saying it is impossible to reuse code to parse a file?

lrhn · 2015-04-16T17:46:52Z

Sticking with the simplest format for now. It's not Java properties - those are hugely complex in comparison. It's just single lines of identifier '=' rest or '#' comment.

munificent · 2015-04-17T21:10:40Z

How about removing # too? If we don't have a use case for comments, may as well keep it simpler.

sethladd · 2015-04-17T22:53:51Z

Putting a note for when this file was generated could be helpful. A comment
could be used for that kind of note. Also a note for what program generated
that file and that it is auto generated could be useful.
On Fri, Apr 17, 2015 at 14:10 Bob Nystrom [email protected] wrote:

How about removing # too? If we don't have a use case for comments, may
as well keep it simpler.

—
Reply to this email directly or view it on GitHub
#1 (comment).

munificent · 2015-04-25T15:54:15Z

SGTM!

munificent mentioned this issue Mar 18, 2015

What should the filename be? #7

Closed

lrhn closed this as completed Apr 16, 2015

munificent mentioned this issue Sep 20, 2019

Language Versioning: .packages file breaking changes? dart-lang/language#365

Closed

How about something other than .properties syntax? #1

How about something other than .properties syntax? #1

Comments

munificent commented Feb 19, 2015

gbracha commented Feb 19, 2015

Uh oh!

anders-sandholm commented Feb 19, 2015

Uh oh!

gbracha commented Feb 19, 2015

Uh oh!

skybrian commented Feb 25, 2015

Uh oh!

munificent commented Feb 26, 2015

Uh oh!

austincummings commented Mar 5, 2015

Uh oh!

munificent commented Mar 5, 2015

Uh oh!

skybrian commented Mar 6, 2015

Uh oh!

zoechi commented Mar 7, 2015

Uh oh!

ghost commented Mar 8, 2015

Uh oh!

sethladd commented Mar 19, 2015

Uh oh!

lrhn commented Mar 19, 2015

Uh oh!

skybrian commented Mar 19, 2015

Uh oh!

ghost commented Mar 19, 2015

Uh oh!

zoechi commented Mar 19, 2015

Uh oh!

ghost commented Mar 19, 2015

Uh oh!

zoechi commented Mar 19, 2015

Uh oh!

ghost commented Mar 19, 2015

Uh oh!

lrhn commented Mar 20, 2015

Uh oh!

munificent commented Mar 20, 2015

Uh oh!

munificent commented Mar 20, 2015

Uh oh!

ovangle commented Mar 20, 2015

Uh oh!

ovangle commented Mar 20, 2015

Uh oh!

ghost commented Mar 20, 2015

Uh oh!

munificent commented Mar 20, 2015

Uh oh!

ghost commented Mar 20, 2015

Uh oh!

munificent commented Mar 20, 2015

Uh oh!

ghost commented Mar 20, 2015

Uh oh!

lrhn commented Mar 22, 2015

Uh oh!

ghost commented Mar 22, 2015

Uh oh!

munificent commented Mar 23, 2015

Uh oh!

munificent commented Mar 23, 2015

Uh oh!

ghost commented Mar 23, 2015

Uh oh!

skybrian commented Mar 23, 2015

Uh oh!

ghost commented Mar 23, 2015

Uh oh!

munificent commented Mar 27, 2015

Uh oh!

lrhn commented Mar 28, 2015

Uh oh!

munificent commented Mar 28, 2015

Uh oh!