Skip to content

Commit 3d56b2f

Browse files
committed
Add lexing details to spec. Fix google#98
1 parent 1e658e5 commit 3d56b2f

File tree

1 file changed

+62
-7
lines changed

1 file changed

+62
-7
lines changed

doc/language/spec.html

Lines changed: 62 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -68,10 +68,10 @@
6868

6969
<h1>Specification</h1>
7070

71-
<p> This page is the authority on what Jsonnet programs should do. It defines Jsonnet syntax and
72-
parsing. It describes which programs should be rejected statically (i.e. before execution).
71+
<p> This page is the authority on what Jsonnet programs should do. It defines Jsonnet lexing and
72+
syntax. It describes which programs should be rejected statically (i.e. before execution).
7373
Finally, it specifies the manner in which the program is executed, i.e. the JSON that is output, or
74-
the dynamic error if there is one.</p>
74+
the runtime error if there is one.</p>
7575

7676
<p>The specification is intended to be terse but precise. The intention is to illuminate various
7777
subtleties and edge cases in order to allow fully-compatible reimplementations of the language, as
@@ -81,6 +81,65 @@ <h1>Specification</h1>
8181
semantics</a>. If that's not your cup of tea, then see the more discussive description of Jsonnet
8282
behavior in <a href="/docs/tutorial.html">tutorial</a>.</p>
8383

84+
<h2>Lexing</h2>
85+
86+
<p>A Jsonnet program is a UTF-8 encoded text file or string. The file is a sequence of tokens,
87+
separate by optional whitespace and comments. Whitespace consists of space, tab, newline and
88+
carriage return. Tokens are lexed greedily. Comments are either single line comments, beginning
89+
with a <code>#</code> or a <code>//</code>, or block comments beginning with <code>/*</code> and
90+
terminating at the first <code>*/</code> encountered within the comment.</p>
91+
92+
<ul>
93+
94+
<li><i>id</i>: Matched by <tt>[_a-zA-Z][_a-zA-Z0-9]*</tt>
95+
<p>
96+
Some identifiers are reserved as keywords, thus are not in the set <i>id</i>:
97+
<code>assert</code> <code>else</code> <code>error</code> <code>false</code> <code>for</code>
98+
<code>function</code> <code>if</code> <code>import</code> <code>importstr</code> <code>in</code>
99+
<code>local</code> <code>null</code> <code>tailstrict</code> <code>then</code> <code>self</code>
100+
<code>super</code> <code>true</code>
101+
</p>
102+
</li>
103+
104+
<li><i>number</i>: As defined by <a href="http://json.org/">JSON</a> but without the leading minus.</li>
105+
106+
<li><i>string</i>: Which can have 3 forms:
107+
<ul>
108+
<li>Double-quoted, beginning with <code>"</code> and ending with the first subsequent non-quoted <code>"</code> </li>
109+
<li>Single-quoted, beginning with <code>'</code> and ending with the first subsequent non-quoted <code>'</code> </li>
110+
<li>Text block, beginning with <code>|||</code>, followed by optional whitespace and a new-line.
111+
The next line must be prefixed with some non-zero length whitespace <i>W</i>. The block ends at the
112+
first subsequent line that does not begin with <i>W</i>, and it is an error if this line does not
113+
contain some optional whitespace followed by <code>|||</code>. The content of the string is the
114+
concatenation of all the lines that began with <i>W</i> but with that prefix stripped. The line
115+
ending style in the file is preserved in the string.</li>
116+
</ul>
117+
</li>
118+
<p>Double- and single-quoted strings are allowed to span multiple lines, in which case whatever
119+
dos/unix end-of-line character appears in the string. They both understand the following escape
120+
characters: <code>"'\bfnrt0</code> which have their standard meanings, as well as
121+
<code>\uXXXX</code> for hexadecimal unicode escapes.</p>
122+
123+
<li><i>symbol</i>:
124+
<ul>
125+
<li>The following single-character symbols:
126+
<p><code>{}[],.();</code></p>
127+
</li>
128+
<li>Sequences of at least one of the following symbols:
129+
<code>!$:~+-&amp;|^=&lt;&gt;*/%</code>
130+
<p>With the following caveats, which will cause the sequence to stop:</p>
131+
<ul>
132+
<li>The sequence <code>//</code> is not allowed in an operator</li>
133+
<li>The sequence <code>/*</code> is not allowed in an operator</li>
134+
<li>The sequence <code>|||</code> is not allowed in an operator</li>
135+
<li>If the sequence has more than one symbol, it is not allowed to end in any of <code>+-~!</code></li>
136+
</ul>
137+
138+
</li>
139+
</ul>
140+
141+
142+
84143
<h2>Abstract Syntax</h2>
85144

86145
<p> In this notation, <i>x</i>★ defines a comma-separated possibly zero-length list of <i>x</i>
@@ -282,10 +341,6 @@ <h2>Abstract Syntax</h2>
282341
</td></tr>
283342
</table>
284343

285-
<p>Additionally, <i>id</i> is defined by regular expression: <tt>[a-zA-Z_][a-zA-Z0-9_]*</tt>. The
286-
definition of <i>string</i> is equivalent to the JSON string, including escape characters. Finally,
287-
<i>number</i> is equivalent to the JSON number, but without the leading <code>-</code>.</p>
288-
289344
<h2>Associativity and Operator Precedence</h2>
290345

291346
<p> The parsing of the concrete syntax into abstract syntax can be controlled by adding parentheses

0 commit comments

Comments
 (0)