Skip to content

[Breaking Change Request] Fix HeaderValue parsing, toString(), and support null values #40709

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sortie opened this issue Feb 20, 2020 · 13 comments
Assignees
Labels
area-sdk Use area-sdk for general purpose SDK issues (packaging, distribution, …). breaking-change-request This tracks requests for feedback on breaking changes NNBD Issues related to NNBD Release

Comments

@sortie
Copy link
Contributor

sortie commented Feb 20, 2020

Note: This proposed behavior has been amended, see below for the current proposal

Intended change

The HeaderValue class will now parse parameters without a value as having the empty string as value instead of a null value. This is the class used to parse semicolon delimited parameters used in the Accept, Authorization, Content-Type, and other such HTTP headers. RFC 7231 section 3.1.1.1 and other locations require such parameters to have a value, but HeaderValue was parsing empty parameters as null value which was wrong according to the specification. The Sec-WebSocket-Extensions websocket header is a special case that allows absent values. The HeaderValues class now parses absent values and empty values as the empty string. For instance

HeaderValue.parse('a; b=B; c; d=; e=E').parameters

used to give {b: "B", c: null, d: null, e: "E"} but now gives {b: "B", c: "", d: "", e: "E"}.

Additionally HeaderValue.toString() was incorrectly not quoting the empty parameter values, which wasn't allowed by the specification. I.e.

new HeaderValue("a", {"b": ""})

used to give text/plain; q= but now gives text/plain; q="".

Rationale

The non-nullable-by-default migration required making subtle changes to some dart:io semantics in order to provide a better API. The HeaderValue.parameters map never contains null values except if parameters not allowed by the standard have been parsed, or in the special case of Sec-WebSocket-Extensions. To avoid code using these parameters from needing to null check the parsed parameter values, these invalid parameters are instead parsed as the empty string.

These absent values also did not survice a roundtrip through HeaderValue as HeaderValue.parse('a; b') used to give a; b=null where null has no special meaning, but will now give a; b="" which is closer to the original input. That means HeaderValue could not and still cannot be used and to construct the Sec-WebSocket-Extensions parameters that don't have values.

Expected impact

HTTP headers such as Accept, Authorization, Content-Type that abide by the standard will be unaffected.

Code handling the Sec-WebSocket-Extensions using HeaderValue may be affected if it doesn't simply check if the parameter is set in the map, but instead handles an empty string differently from null. The relevant code and tests in the SDK has been updated.

It's hard to rule out if someone has built other non-standard extensions, where the class has been used for valueless parameters, such as foo; bar where the header's value is foo with the bar parameter, which is handled distinctly from if the bar parameter had a value (foo; bar=qux). Though the HeaderValue can only receive such invalid parameters, as it was not able to construct them, instead producing foo; bar=null.

Steps for mitigation

If code relies on HeaderValue returns has a null value for valueless parameters, it'll need to be updated to allow the empty string as well. However, if code relies on being able to tell a parameter without a value apart from a parameter with the empty string, then head then the HeaderValue class can't solve that problem anymore and one will instead need to build a custom parser.

Alternatively if this sort of non-standard valueless parameters turns out to exist in practice or there's non-SDK code parsing Sec-WebSocket-Extensions, we can decide not to make this breaking change, and embrace this non-standard behavior as a feature.

Implementation

This change was implemented in https://dart-review.googlesource.com/c/sdk/+/136620.

Note: This proposed behavior has been amended, see below for the current proposal

cc @franklinyow @mit-mit @lrhn @athomas

@sortie sortie added area-sdk Use area-sdk for general purpose SDK issues (packaging, distribution, …). breaking-change-request This tracks requests for feedback on breaking changes NNBD Issues related to NNBD Release labels Feb 20, 2020
@franklinyow
Copy link
Contributor

cc @Hixie @matanlurey @dgrove @vsmenon for review and approval.

@Hixie
Copy link
Contributor

Hixie commented Feb 24, 2020

It would also break Accept header value parsing.

@Hixie
Copy link
Contributor

Hixie commented Feb 24, 2020

I don't have an objection to this change. I think it's fine to say that HeaderValue is only useful for the subset of headers that have the syntax it expects. In practice each header has its own syntax and so it's perfectly reasonable for different headers to have different semantics here, and no API is going to be able to support all of them regardless.

That said, we should probably just deprecate this class entirely. As far as I can tell, it doesn't provide an interface that works for any header value. As noted above, Accept and Sec-WebSocket-Extensions don't work with this because it can't represent missing values in parameter extensions (not to mention that an Accept API should probably expose q weights as an int, should expose the default weight as 1000, etc). But there are lots of other differences. For example, TE headers allow whitespace around the = character of all their parameters except the q= value. The API only allows one parameter value per name, but there's nothing in the specs that says that parameter values in general can't be listed multiple times. The API doesn't reflect the order of the parameters, but in principle the order can be meaningful (Accept does this, for instance, with the "q" parameter separating the media range parameters from the accept extension parameters). There's probably other theoretical cases where this API doesn't quite work.

@sortie
Copy link
Contributor Author

sortie commented Feb 24, 2020

@Hixie Note that the Accept header and others are supposed to be parsed by first splitting on the commas, and then parsing each of those using HeaderValue. The HeaderValue class was never able to parse a full Accept header on its own, despite that the documentation seems to imply so. With that in mind, I don't see how this change breaks parsing of the Accept header compared to the current situation? Can you give an example? I do understand the current situation is suboptimal, the question is whether this change breaks anything additional.

@Hixie
Copy link
Contributor

Hixie commented Feb 24, 2020

Accept allows the =value part to be omitted, it's the same problem as Sec-WebSocket-Extensions.

@matanlurey
Copy link
Contributor

No input

@vsmenon
Copy link
Member

vsmenon commented Feb 27, 2020

lgtm

@leafpetersen
Copy link
Member

I'm reading the status of this as approved for the breaking change, yes?

@franklinyow
Copy link
Contributor

Approved

@franklinyow franklinyow assigned franklinyow and sortie and unassigned franklinyow Mar 2, 2020
@sortie
Copy link
Contributor Author

sortie commented Mar 6, 2020

Thank you for the discussion.

I've discussed the Accept header problem with @lrhn and we're currently intending to amend the proposed behavior. We'll instead embrace valueless parameters as a feature, so it can be used for Accept and Sec-WebSocket-Extensions as well. Our thinking is that this is more compatible and more useful in the long term. I'm currently implementing this updated proposed behavior and will determine if it is breaking and amend this breaking change request as required.

@sortie
Copy link
Contributor Author

sortie commented Mar 13, 2020

I've amended the proposed behavior per the discussions and implemented it. Please see below:

Intended change

The HeaderValue class now parses more strictly in two invalid edge cases. This is the class used to parse semicolon delimited parameters used in the Accept, Authorization, Content-Type, and other such HTTP headers.

The empty parameter value without double quotes (which is not allowed by the standards) is now parsed as the empty string rather than null. E.g. HeaderValue.parse("v;a=").parameters now gives {"a": ""} rather than {"a": null}.

Invalid inputs with unbalanced double quotes are now rejected. E.g. HeaderValue.parse('v;a="b').parameters will now throw a HttpException instead of giving {"a": "b"}.

Additionally HeaderValue.toString() has been fixed to produce valid output in a lot of cases rather than grossly wrong input that would not survive another roundtrip through the parser.

The HeaderValue.toString() method now supports parameters with null values by omitting the value. HeaderValue("v", {"a": null, "b": "c"}).toString() now gives v; a; b=c. This behavior can be used to implement some features in the Accept and Sec-WebSocket-Extensions headers.

Likewise the empty value and values using characters outside of RFC 7230 tokens are now correctly implemented by double quoting such values with escape sequences. E.g:

HeaderValue("v", 
    {"a": "A", "b": "(B)", "c": "", "d": "ø", "e": "\\\""}).toString() 

now gives v;a=A;b="(B)";c="";d="ø";e="\\\"".

Rationale

The HeaderValue class was inconsistent about what features it supported. Parameters with empty values were treated equally to parameters without values. Invalid inputs such as unbalanced double quotes were not rejected. The toString() method was particularly bad as it was not able to produce a valid representation for all cases that could be parsed again. This change embraces the features that happened to work as actual supported features, which lets us decide on what the API will look like under Dart's upcoming null safety feature.

Expected impact

HTTP headers such as Accept, Authorization, Content-Type, and Sec-WebSocket-Extensions that abide by the standards will be unaffected. Only two cases change:

  1. Invalid headers with unbalanced quotes will now be rejected.
  2. Invalid headers that have a parameter of the empty value without double quotes will now be parsed as the empty value instead of null.

Steps for mitigation

Clients that sent unbalanced quotes in parameters will need to be fixed. It's not expected that any clients would be doing this as such headers would be severely malformed.

Clients that send parameters with empty values without double quotes, and expecting the Dart server to consider that different from no value, will need to be updated to wrap the value in double quotes. Such clients are unlikely as this is not allowed by the standards, although the HeaderValue.toString() method before this change could potentially produce such an invalid header. If such clients are written in Dart, they can be updated to a version of Dart with this change, or the server could be updated to not distinguish between empty parameter values and absent parameter values.

Implementation

This change was implemented in https://dart-review.googlesource.com/c/sdk/+/136620.

@sortie sortie changed the title [Breaking Change Request] HeaderValue parses invalid parameters as empty string instead of null [Breaking Change Request] Fix HeaderValue parsing, toString(), and support null values Mar 13, 2020
@sortie
Copy link
Contributor Author

sortie commented Mar 17, 2020

The new implementation has passed a G3 global presubmit and has passed review. The remaining task is to decide on the exact behavior if the HeaderValue constructor is called with a null parameters map. Once that's resolved, I will proceed with landing this change.

dart-bot pushed a commit that referenced this issue Mar 18, 2020
This is a breaking change. #40709

This change makes the HeaderValue parsing more strict in two invalid
edge cases, supports parameters with null values as a feature, and fixes
toString() so it always produces tokens or quoted-strings valid per RFC
7230 3.2.6.

The empty parameter value without double quotes (which is not allowed by
the standards) is now parsed as the empty string rather than null. E.g.
HeaderValue.parse("v;a=").parameters now gives {"a": ""} rather than
{"a": null}.

Invalid inputs with unbalanced double quotes are now rejected. E.g.
HeaderValue.parse('v;a="b').parameters will now throw a HttpException
instead of giving {"a": "b"}.

The HeaderValue.toString() method now supports parameters with null
values by omitting the value. E.g.:

  HeaderValue("v", {"a": null, "b": "c"}).toString()

now gives

  v; a; b=c

This behavior can be used to implement some features in the Accept and
Sec-WebSocket-Extensions headers.

Likewise the empty value and values using characters outside of RFC 7230
3.2.6 tokens are now correctly implemented by double quoting such values
with escape sequences. E.g.:

  HeaderValue("v",
      {"a": "A", "b": "(B)", "c": "", "d": "ø", "e": "\\\""}).toString()

now gives

   v;a=A;b="(B)";c="";d="ø";e="\\\""

The NNBD migration required making subtle changes to some dart:io
semantics in order to provide a better API. This change backports one of
these semantic changes to the unmigrated SDK so any issues can be
discovered now instead of blocking the future SDK unfork.

Change-Id: Iafc790e03b6290232cac71fe14f995ce0f0b036b
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/136620
Commit-Queue: Jonas Termansen <[email protected]>
Reviewed-by: Lasse R.H. Nielsen <[email protected]>
@sortie
Copy link
Contributor Author

sortie commented Mar 18, 2020

I've landed the breaking change.

@sortie sortie closed this as completed Mar 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-sdk Use area-sdk for general purpose SDK issues (packaging, distribution, …). breaking-change-request This tracks requests for feedback on breaking changes NNBD Issues related to NNBD Release
Projects
None yet
Development

No branches or pull requests

6 participants