-
Notifications
You must be signed in to change notification settings - Fork 7
How about something other than .properties syntax? #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I like the idea of using a Dart map literal. |
Just to reflect the further discussion over email on this: |
YAYAML |
I would like to generate and parse this file in Go easily. JSON works for me. YAML is not well-specified or simple. The current property format also works in principle, but I agree with Bob that we shouldn't reference the Java property file spec which is a bit weird. I think it would be okay to define the parsing rules right in the spec if they're simple enough. Here are my recommendations for a generic line-based format:
The idea is to make it trivial to write a correct parser in any scripting language with split() and replaceAll() string functions, so people aren't tempted to do anything else. |
https://github.com/go-yaml/yaml While YAML may not be easy to parse, someone has already done the work in practically every language out there. Not that I'm personally attached to YAML. |
I think the syntax for the pkgspec.properties file is fine. I do think the name is too long though. It looks much better just as pkgspec. Or maybe dump the information in the pubspec.lock file? Is that possible? |
Unfortunately no, though it's a good idea. The problem is that the pkgspec file will contain paths that point to weird user-specific directories, like your pub cache in your home directory. You definitely don't want to check that into source control. However, you do probably want to check your lockfile in, since it ensures everyone using your package uses the same versions of its dependencies. Hence, two files. |
Regarding YAML, even though many libraries exist, it's unclear how compatible they are. For the package you linked to: "The yaml package supports most of YAML 1.1 and 1.2, including support for anchors, tags, map merging, etc. Multi-document unmarshalling is not yet implemented, and base-60 floats from YAML 1.1 are purposefully not supported since they're a poor design and are gone in YAML 1.2." Sounds pretty complicated to me. There is value in using a format that's so simple you can keep it all in your head and implement it yourself if needed. |
From what I have seen only a map from string to string is used in this file, right? I would expect this Go package to support this basic use case. I guess this limited subset would be even easy to parse with custom code but it would also still be a yaml file. |
If the file defines rules for for path resolution, it would be logical to follow convention and use term "path" somewhere in its name. E.g. simply "path.yaml" or "dartpath.yaml" or something. (obvious analogies are PATH, CLASSPATH, LD_LIBRARY_PATH etc). |
If our file format is not actually YAML, but is some "well-defined subset of YAML", then let's not name the file extension foo.yaml implies the file can be any yaml, which might include comments or other yaml bits. |
I'll just suggest an even simpler format which is a subset of JAVA properties files, Windows ini-files and the gazillion different key/value files used in Unix (properties files are not just Java, it's just Java's name for its variant of a very common family of formats). Syntax:
No whitespace handling, no comments. No file extension (name is just Semantics: That's simple (as simple as possible at least, we are dealing with URIs) and it gets the job done. Any further syntax isn't needed to solve the current problem, but can be argued to be in preparation for maybe extending the use of the file in the future, which is purely speculative, and probably YAGNI. |
It can be made simpler. For the value, if we disallow newlines and trailing whitespace then it doesn't need any escape sequences. Any space characters in a URL can be URL-escaped anyway. I generally do like to support line comments and trailing whitespace removal since it's trivial to do. Requiring UTF-8 is generally a good idea for modern text formats. Anyway, these are details. Should we agree that it's okay to invent a simple text format then it's not hard to do. |
I think the ban on comments is too harsh indeed. Am I the only one who doesn't like the name pkgspec? Name somehow suggests that we can find package name, author etc there, but in fact there's another file (pubspec.yaml) for that. I googled "package spec" - and indeed, it's used with different meaning elsewhere (e.g PL/SQL). Which, of course, could be ignored as coincidence, but... ours is not a package spec at all :) It can be further argued that the choice of name is made (quoting Lasse) "in preparation for maybe extending the use of the file in the future, which is purely speculative, and probably YAGNI". Hehe. |
👍 I don't like it at all. I still like |
I think the term "path" should somehow appear as part of the name. Maybe it's only me though :) |
I don't miss path. For me path is more an implementation detail, but IMHO still better than |
But the whole issue is about implementation detail! |
I have no strong preference for the name, except that it should contain "package" or "pkg" and shouldn't be too long. Using "packages" is fine with me. |
That's what #7 is for. Let's keep this bug focused on the syntax. |
I really don't think we should. Already even with Lasse's minimal proposal we've ratholed on encoding (ASCII? UTF-8?), line endings, whitespace (stripped? not?), escaping, and comments. I do think we should pick something extensible because I think there's a very high chance we'll want something like search paths eventually. Can we just say the file is:
That's it. Fully specified including encoding, whitespace, and escaping. Comments are allowed and specified. The VM already has code to parse it. It's easy for tools to generate it (just use JSON, which is a subset of it). We have code to parse it (use the analyzer or dart2js parser). Also, it's easily expandable: just allow other value types in the map. Thoughts? |
YAML is consistent with other config files in the language and I don't think see any need to
or
As far as I can understand, the only entries allowed in the file are
that's all that ever needs to be parsed. A file which uses any YAML syntax other than that can just be rejected as |
ps. Mr. Nystrom's idea gets my vote. pps. I realised after some reflection that an informal yaml subset probably won't work, because JSON is valid YAML. so the json file
is a valid yaml file with exactly the same content as the key value pairs, but would be rejected as ppps. Although you could fix that by having the file start with an optional leading '{' and an optional trailing '}', and the file is otherwise limited to the subset I mentioned. pppps. No more postscripts. |
Having 2 files (pubspec and pkgspec) implementing completely different formats (one -yaml, another -dart literal) could make users think right hand doesn't know what left hand is doing here. |
It would be nice to be consistent, but the reality is that those files are consumed by different tools with different constraints. The pubspec is read by pub, which operates at a strictly higher level than the VM and dart2js. The pkgspec file is lower-level than that. Honestly, a better solution would possibly have been to make pubspecs use Darty syntax (i.e. any kind of literal). But we can't go back in time and do that, so... |
You can't go back in time, but, at least in theory, you can add support for pubspec written in dart syntax, as an option. Two files may have different extensions. |
I don't think adding a third option while leaving the previous one increases consistency. :) |
If you declare old one deprecated, then it will show a tendency towards consistency. Consistency is always in a state of flux, it's a moving target, so the effort is commendable. No? |
I think allowing a Dart map literal with string literal keys and values is still too allowing. It should at least restrict the string literals to not have interpolations, and possibly also restrict the kinds of string literals accepted (adjacent string combination? multiline strings? raw strings?). |
We already know that every declarative data format strives to acquire procedural features over time -until it becomes kind of full-blown Turing-complete programming language (and very lousy one at that). May it be a good idea to specify configuration using a program to begin with? Even a simplest config file may require variables to avoid repetition. E.g. Naturally, it's can be programmed as: /A.K: There's no such thing as "data". Every piece of data is in fact a program to be processed by some "interpreter". Said interpreter, in turn, produces another piece of data, which is a program for another interpreter. Etc. Where does this chain end? In the ultimate interpreter: our mind. :-) |
👍
We could do singleLineString and prohibit interpolation. I'm agnostic about this. It just depends on what the VM folks want.
That doesn't give us comments. 😞 But I certainly wouldn't veto JSON if it came down to that. |
I've certainly seen that happen in many cases but, so far, pubspecs haven't gone in that direction. I think it has a lot to do with how large the files are and how much time programmers spend hand-authoring them. Once you have lots of big manually maintained files, you start wanting all of the abstraction and reuse features a real language gives you. This file will be pretty small and mostly auto-generated, so I don't think there will be much pressure to make it Turing-complete. However, making it an actual script that can do stuff like IO would be interesting. You could basically implement something like a class loader or Pythons metaimport hooks with it. My guess is it's not worth the complexity of doing that, though. The analyzer would have to execute this script, and that adds a pretty big complexity tax. |
Yes, it is a complexity tax. Which could be justified and even turned to our advantage if it can eliminate or reduce OTHER taxes. Maybe something returned by this script can be used for conditional imports? Look this way: we have pubspec, this pkgspec, conditional import - all are different kinds of config. if (Platform.isWindows && something || somethingElse) importConfig["dart.io"]=true; And then, at least, there's no problem for analyzer to statically evaluate any expressions. There can be other advantages, yet to be discovered. |
We can and will write tools to generate this file. Part of the reason we're doing it in the first place is to integrate with other build systems. A generator can be written in any language you like and can certainly do I/O. The config file itself, however, is not a program; it's a program's output, part of the communications protocol between some other tool and the analyzer. It's a serialization of a map, perhaps with a few niceties for human readability if it happens to be hand-edited. Seems like JSON is best since it's easiest to agree on? It might be interesting to specify a Dart script in pubspec.yaml to act as a pubspec generator, or perhaps do it in build.dart? But it can't rely on any imported Dart packages or there's a chicken-and-egg problem. Seems like we could defer this for now. |
Note that if you have a scipt (e.g. config.dart), you can still have YAML files or JSON files - script can read and massage them: main() { I would think of a script as an egg :-) It's debatable though. |
We talked about the syntax in the DEP meeting this week. Our feeling is that we really would like something that is a very restricted subset of Dart syntax. Doing that plays nicer with existing tools, and gives us room to grow over time if needed. To try to make things easier on the VM folks, our idea is that we'll reuse the lexical grammar (i.e. the scanner) as-is and then specify the package spec using a very very rigid minimal grammar. If the VM folks can't easily reuse the parser for map literals, this would let them write a simple one-off parser just for this. Here's an example of the grammar I have in mind: const packages = const {
r"args": r"/path/to/args",
r"unittest": "r/path/to/unittest"
}; The full grammar is: packageSpec: mapLiteral: mapLiteralEntry: rawString: The packageSpec production is new. The rawString production is just the last clause in the spec's singleLineString rule. Note that no interpolation or concatenation is allowed for strings. The mapLiteral differs in that it:
Finally, mapLiteralEntry requires the key and value to be rawString. This looks very simple to me. At the same time, it answers all of our questions about encoding, escaping, comments, and whitespace. Further, our existing tools can parse and syntax highlight it today. It's even a valid Dart library, so a program could import it and use the What do you think? |
It's a bold move, requiring a const declaration. It does mean that any Dart parser can parse the file, and you can even choose to import it from Dart if you want to. The only drawback is that by requiring raw strings, the map isn't also a JSON literal, which would have made the format easier to parse for non-Dart tools. I do think that a custom parser can be faster (but not sure how much) than just using the generic Dart tokenizer/parser, and if we are using a custom parser anyway, then using a known format isn't that much of an advantage - we are just inventing yet another key/value format. I can live with this format, and if the VM can implement it efficiently, I guess it should be fine. |
Well, the preceding
I don't think parsing speed for this should be a bottleneck. This file will certainly be smaller that most user programs, and the VM has to parse those before it can run.
🎉 |
I still need to read and write this file from Go. Given that most languages don't have a Dart parser available, and even in Dart, there is not yet a stable public API for invoking a Dart parser, making this a subset of Dart seems like a bad idea. I don't think anyone has any serious objections to JSON? |
+1 to using a format that is easily consumable from other languages. This will make consumption from other (non-dart) tools easier. |
Another thing with using Dart syntax is that you need to have the same white-space rules as Dart too. That might be more complex than what you need. At least it needs to be specified what the rules are. |
Instead of restricting map literal syntax, how about restricting JSON syntax by simple regexp - e.g. by defining stringLiteral -> double-quote sequence-of-chars-with-no-backslash-or-newline double-quote |
Actually, "honest" parsing of restricted JSON (no regexp) is quite trivial: 60 lines including all error processing. |
@iposva-google might. |
This is not a question what the VM can or cannot parse, or for what it already has a partial implementation of a grammar. We need the embedders of the VM to implement this and so it needs to be a super-simple file format. The VM does not do any of the loading directly, instead it is setup to be run in a sandboxed environment relying on the embedder to deal with the resolution of URIs and the loading of the sources. In short: The proposal as it is currently written up is workable (commit 38378cb), single line with "=" separating key and value. |
|
Sticking with the simplest format for now. It's not Java properties - those are hugely complex in comparison. It's just single lines of |
How about removing |
Putting a note for when this file was generated could be helpful. A comment
|
SGTM! |
I'm not a big fan of the ".properties" syntax. I don't think anyone outside of the Java ecosystem is even aware of it, and, as far as I can tell, it's official specification is just a doc comment in the Java API.
Most egregiously, properties files are specified to require ISO 8859-1 character encoding. That's gross. Everyone these days prefers UTF-8.
How about we just use JSON? That's the default config file format for the web these days.
If that's a pain for the VM to parse, another option would be to say that the file must contain a Dart map literal with string literal keys and values. That would presumably let the VM use the existing lexer and parser on it.
Thoughts?
The text was updated successfully, but these errors were encountered: