-
Notifications
You must be signed in to change notification settings - Fork 48.7k
Split escapeTextForBrowser into escapeTextContentForBrowser and quoteAttributeValueForBrowser #1599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
What about unquoted attribute values, though? Or is there no way to generate those using React? |
There's no way to write arbitrary HTML (well, you can with the dangerouslySetInnerHTML property but then you're on your own); React builds all the HTML itself. Source code looks something like
which is statically transformed into
which React then produces markup for using this code, so there's never an opportunity for an unquoted attribute value to be made. |
Okay, just wanted to make sure. |
We can conditionally generate unquoted attribute values when it's guaranteed to be safe (i.e, for numbers and ASCII char sequences). The only reason I can think of would be to save bytes, it's imaginable that it would (im)measurably improve performance as we could skip the overhead of the escaping function and generate measurably smaller HTML (which would improve The one thing that speaks to me here is that for really well-written HTML you're not as likely to find attributes with whitespace or unsafe chars (mostly single classes and simple values). When It feels kind of dirty, but HTML5 makes no mention of depreciation of unquoted attribute values (http://www.w3.org/TR/html-markup/syntax.html#syntax-attr-unquoted) and as usual provide an exact implementation to follow. Personally I'm torn between it feeling "dirty" and it being "the most efficient implementation". What's your take @yungsters ? |
I think stripping quotes can be something we look into after we nail text escaping first. I'm not too against "dirty" stuff if it's handled by a framework and done correctly. (I would also want to profile the impact of an additional check or pattern match for every attribute.) A lot of the discussion here has been ensuring that we generate correct HTML (where correct is the specifications listed above plus existing browser behavior). I think as a framework that makes it easy to set attribute values or text content using user input, we have an obligation to also ensure that the generated HTML is safe and secure — not vulnerable to XSS. |
@yungsters Interesting and I agree with everything you said. It should be a separate PR regardless and I wouldn't mind doing the necessary work for compiling a rough best/worst-case performance/size benchmark. Also, I fully agree and understand your point about XSS and was honestly expecting more an opposition for this PR (for that reason). |
@zpao @yungsters If you agree with the refactoring/corrections done by this PR, a stopgap solution is that I reintroduce all the current rules and you can merge it as-is. If/when your security team clears the reduced set of rules, then we just reduce the rules in the escapers. Thoughts? |
@zpao @yungsters As I "discussed" in my previous comment, I have reverted this PR to use the current escaping rules instead. It now only removes the invalid escaping of attribute names and introduces a new So it seems to me that there should be nothing controversial about this PR. It enables us to easily enable narrower escaping (or omitting quotes for simple values) in the future and improves the code in general. Attaching the narrower, now reverted, escaping functions (for posterity):
|
@yungsters for review (though he's out so I might need to reping him in 2 weeks) |
ping @yungsters, this PR is currently limited to only "splitting escapeTextForBrowser into escapeTextContentForBrowser and quoteAttributeValueForBrowser" for making the code arguably neater and also fixing some related "mistakes" (like escaping attribute names). Any objections? |
invariant( | ||
'Can only set one of `children` or `props.dangerouslySetInnerHTML`.' | ||
); | ||
invariant( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you revert the indentation changes here and below?
Sorry for missing this. The changes look straightforward and reasonable to me. There are some extraneous changes included in the pull request. Can you remove those? Otherwise, this looks good to me. |
@yungsters Sorry about the indentation, was caused by the rebase and I missed it, fixed. Which extraneous changes are you referring to, removal of EDIT: Ah, perhaps you're referring to the removal of (incorrectly) escaped attribute names too? It's kind of weird to do |
@yungsters I removed all "extraneous changes" that I could find, give me thumbs up and I'll merge this in and put up a separate PR for those changes/fixes. |
|
||
"use strict"; | ||
|
||
var escapeTextContentForBrowser = require('./escapeTextContentForBrowser'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zpao, should this just be require('escapeTextContentForBrowser')
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aaaah shit, ofc it should be.
Note to self, need to rebase and update another added |
…AttributeValueForBrowser
My bad, I've rebased, fixed the require and license. |
Looks good to me. |
Split escapeTextForBrowser into escapeTextContentForBrowser and quoteAttributeValueForBrowser
IMHO the preferable solution to #1461, my comment from that PR:
Chrome only escapes
<
,>
and&
when settingtextContent
, it only escapes&
and"
when setting an attribute. Which I would say makes my suggestion above quite a lot less alien (to me at least)."
can be used to break out of"
for quoted attribute values.'
and"
is unnecessary because we're dealing with plain text and/
is unnecessary because it's a precaution against there being an unescaped<
before the injected content.Attribute names: discard invalid
Attribute values:
&
+"
Text content:
&
+<
+>
With these rules we generate the same HTML that browsers do, no extra clutter.
The only danger I see is if
dangerouslySetInnerHTML
is used with invalid HTML, if there's an unclosed quoted attribute then anyone can now add as many attributes as they want (an unclosed tag is not an issue though), whereas if we quote"
they can only add more data to that attribute. But really, if what you're sending todangerouslySetInnerHTML
is not rigorously vetted (or at the very least valid HTML) you're knee deep in trouble regardless. The safest solution would be to not includedangerouslySetInnerHMTL
in the initial markup at all, but to always set it withinnerHTML
.Note that
escapeTextForBrowser
was renamed toescapeTextContentForBrowser
, so any external uses of it will now be greeted with an error (instead of a potentially dangerous situation).I also added a test that explicitly verifies the output of ReactDOMComponent against a manually and correctly escaped string.
PS. Even if you don't like these "minimal rules", the separation between
escapeTextContentForBrowser
andquoteAttributeValueForBrowser
makes a lot of sense to me, this PR also does away with all the incorrect escaping of attribute names.