Skip to content

Commit 719bc64

Browse files
gkelloggafsyamdanTallTed
authored
Update the use of ECHAR and UCHAR in canonical N-Quads. (#27)
* Update the use of ECHAR and UCHAR in canonical N-Quads. Fixes #16. * Add paragraph saying that Canonical N-Quads extends Canonical N-Triples. --------- Co-authored-by: Andy Seaborne <[email protected]> Co-authored-by: Dan Yamamoto <[email protected]> Co-authored-by: Ted Thibodeau Jr <[email protected]>
1 parent 3820e3f commit 719bc64

File tree

1 file changed

+30
-34
lines changed

1 file changed

+30
-34
lines changed

spec/index.html

Lines changed: 30 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -46,8 +46,9 @@
4646
.separated thead tr th { border:1px solid black; padding: .2em; }
4747
.separated tbody tr td { border:1px solid black; text-align: center; }
4848
.separated tbody tr td.r { text-align: right; padding: .5em; }
49-
.grammar td { font-family: monospace;}
49+
.grammar td { font-family: monospace; vertical-align: top; }
5050
.grammar-literal { color: gray;}
51+
.grammar_comment { color: #A52A2A; font-style: italic; }
5152
code {color: #ff4500;} /* Old W3C Style */
5253
</style>
5354
</head>
@@ -241,17 +242,22 @@ <h3>RDF Blank Nodes</h3>
241242
<h2>A Canonical form of N-Quads</h2>
242243

243244
<p>This section defines a canonical form of N-Quads which has
244-
less variability in layout.
245-
The grammar for the language is the same.</p>
245+
a completely specified layout.
246+
The grammar for the language is unchanged.</p>
246247

247-
<p class="note">A canonical form of N-Quads can be used to ensure
248-
that variations in the syntactic representation of terms
249-
within that quad are determined; each code point
248+
<p>Canonical N-Quads extends
249+
<a data-cite="RDF12-N-TRIPLES#canonical-ntriples">Canonical N-Triples</a> in [[RDF12-N-TRIPLES]]
250+
to include <code><a href="#grammar-production-graphLabel">graphLabel</a></code>.</p>
251+
252+
<p>While the N-Quads syntax allows choices for the representation and layout of RDF data,
253+
the canonical form of N-Quads provides a unique syntactic representation of any quad.
254+
Each code point
250255
can be represented by only one of
251256
<code><a href="#grammar-production-UCHAR">UCHAR</a></code>,
252257
<code><a href="#grammar-production-ECHAR">ECHAR</a></code>,
253258
or unencoded character,
254-
where the relevant production allows for a choice in representation.</p>
259+
where the relevant production allows for a choice in representation.
260+
Each quad is represented entirely on a single line with specified white space.</p>
255261

256262
<p>Canonical N-Quads has the following additional constraints on layout:</p>
257263
<ul>
@@ -266,35 +272,25 @@ <h2>A Canonical form of N-Quads</h2>
266272
MUST NOT use the datatype IRI part of the <a href="#grammar-production-literal">literal</a>,
267273
and are represented using only <a href="#grammar-production-STRING_LITERAL_QUOTE">STRING_LITERAL_QUOTE</a>.
268274
</li>
269-
<!--li><code><a href="#grammar-production-HEX">HEX</a></code> MUST use only uppercase letters (<code>[A-F]</code>).</li-->
270-
<li>Characters MUST NOT be represented by <code><a href="#grammar-production-UCHAR">UCHAR</a></code>.</li>
275+
<li><code><a href="#grammar-production-HEX">HEX</a></code> MUST use only uppercase letters (<code>[A-F]</code>).</li>
271276
<li>Within <a href="#grammar-production-STRING_LITERAL_QUOTE">STRING_LITERAL_QUOTE</a>,
272-
the characters
273-
<code>U+0022</code>, <code>U+005C</code>, <code>U+000A</code>, <code>U+000D</code>
274-
MUST be encoded using <code><a href="#grammar-production-ECHAR">ECHAR</a></code>.
275-
<code><a href="#grammar-production-ECHAR">ECHAR</a></code> MUST NOT be used for characters that are
276-
allowed directly in
277-
<code><a href="#grammar-production-STRING_LITERAL_QUOTE">STRING_LITERAL_QUOTE</a></code>. </li>
278-
<li>The token <code><a href="#grammar-production-EOL">EOL</a></code> MUST be a single <code>U+000A</code>.</li>
279-
<li>The final <code><a href="#grammar-production-EOL">EOL</a></code> MUST be provided.</li>
277+
the characters
278+
<code>U+0008</code> (<code title="BACKSPACE"><sub>BS</sub></code>),
279+
<code>U+0009</code> (<code title="HORIZONTAL TAB"><sub>HT</sub></code>),
280+
<code>U+000A</code> (<code title="LINE FEED"><sub>LF</sub></code>),
281+
<code>U+000C</code> (<code title="FORM FEED"><sub>FF</sub></code>),
282+
<code>U+000D</code> (<code title="CARRIAGE RETURN"><sub>CR</sub></code>),
283+
<code>U+0022</code> (<code title="DOUBLE QUOTE">&quot;</code>), and
284+
<code>U+005C</code> (<code title="BACKSLASH">\</code>)
285+
MUST be encoded using <code><a href="#grammar-production-ECHAR">ECHAR</a></code>.
286+
Characters in the range from <code>U+0000</code> to <code>U+001F</code>
287+
and <code>U+007F</code> (<code title="delete"><sub>DEL</sub></code>)
288+
that are not represented using <code><a href="#grammar-production-ECHAR">ECHAR</a></code>
289+
MUST be represented by <code><a href="#grammar-production-UCHAR">UCHAR</a></code>.
290+
All other characters MUST be represented by their native [[UNICODE]] representation.</li>
291+
<li>The token <code><a href="#grammar-production-EOL">EOL</a></code> MUST be a single <code>U+000A</code>.</li>
292+
<li>The final <code><a href="#grammar-production-EOL">EOL</a></code> MUST be provided.</li>
280293
</ul>
281-
282-
<div class="issue" data-number="16">
283-
<p>Re-consider the use of `UCHAR` and `ECHAR` escapes in N-Triples/N-Quads canonicalization.
284-
The 1.1-based recommendation prohibits the use of `UCHAR` (`U+XXXX`)
285-
and allows `ECHAR` only for `U+0022` (quote `\"`),
286-
`U+005C` (backslash `\\`),
287-
`U+000A` (<code title="LINE FEED"><sub>LF</sub></code> `\n`),
288-
and `U+000D` (<code title="CARRIAGE RETURN"><sub>CR</sub></code> `\r`).
289-
However, the use of control characters can obfuscate text when presented,
290-
creating a potential security concern.</p>
291-
292-
<p>A future version may consider requiring all characters between
293-
`U+0000` and `U+001F` (other than `U+000A` (<code title="LINE FEED"><sub>LF</sub></code>)
294-
and `U+000D` (<code title="CARRIAGE RETURN"><sub>CR</sub></code>))
295-
along with `U+007F` (<code title="delete"><sub>DEL</sub></code>)
296-
to be represented using `UCHAR`.</p>
297-
</div>
298294
</section>
299295

300296
<section id="conformance">

0 commit comments

Comments
 (0)