Skip to content

Split escapeTextForBrowser into escapeTextContentForBrowser and quoteAttributeValueForBrowser #1599

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 5, 2015

Conversation

syranide
Copy link
Contributor

IMHO the preferable solution to #1461, my comment from that PR:

Chrome only escapes <, > and & when setting textContent, it only escapes & and " when setting an attribute. Which I would say makes my suggestion above quite a lot less alien (to me at least).

  1. It aligns with OWASPs stance that that only " can be used to break out of " for quoted attribute values.
  2. It also aligns with OWASPs recommendation for text/html, ' and " is unnecessary because we're dealing with plain text and / is unnecessary because it's a precaution against there being an unescaped < before the injected content.
  3. Attribute names cannot be escaped, invalid chars invalidate the entire attribute name. Since we already very strictly filter against invalid attribute names we don't need to do anything further for attribute names.

Attribute names: discard invalid
Attribute values: & + "
Text content: & + < + >


With these rules we generate the same HTML that browsers do, no extra clutter.

The only danger I see is if dangerouslySetInnerHTML is used with invalid HTML, if there's an unclosed quoted attribute then anyone can now add as many attributes as they want (an unclosed tag is not an issue though), whereas if we quote " they can only add more data to that attribute. But really, if what you're sending to dangerouslySetInnerHTML is not rigorously vetted (or at the very least valid HTML) you're knee deep in trouble regardless. The safest solution would be to not include dangerouslySetInnerHMTL in the initial markup at all, but to always set it with innerHTML.


Note that escapeTextForBrowser was renamed to escapeTextContentForBrowser, so any external uses of it will now be greeted with an error (instead of a potentially dangerous situation).

I also added a test that explicitly verifies the output of ReactDOMComponent against a manually and correctly escaped string.

PS. Even if you don't like these "minimal rules", the separation between escapeTextContentForBrowser and quoteAttributeValueForBrowser makes a lot of sense to me, this PR also does away with all the incorrect escaping of attribute names.

@mathiasbynens
Copy link

What about unquoted attribute values, though? Or is there no way to generate those using React?

@sophiebits
Copy link
Collaborator

There's no way to write arbitrary HTML (well, you can with the dangerouslySetInnerHTML property but then you're on your own); React builds all the HTML itself. Source code looks something like

render: function() {
  return <a href="hello">click me</a>;
}

which is statically transformed into

render: function() {
  return React.DOM.a({href: "hello"}, "click me");
}

which React then produces markup for using this code, so there's never an opportunity for an unquoted attribute value to be made.

@mathiasbynens
Copy link

Okay, just wanted to make sure. dangerouslySetInnerHTML sounds dangerous indeed :)

@syranide
Copy link
Contributor Author

We can conditionally generate unquoted attribute values when it's guaranteed to be safe (i.e, for numbers and ASCII char sequences).

The only reason I can think of would be to save bytes, it's imaginable that it would (im)measurably improve performance as we could skip the overhead of the escaping function and generate measurably smaller HTML (which would improve innerHTML performance).

The one thing that speaks to me here is that for really well-written HTML you're not as likely to find attributes with whitespace or unsafe chars (mostly single classes and simple values). When data-reactid is gone, it means that removing unnecessary quotes might realistically yield in the vicinty of 5-10% less HTML in the best cases. However, as virtually everyone serves compressed HTML, the real world difference would probably be no more than 1-5% (in the best case).

It feels kind of dirty, but HTML5 makes no mention of depreciation of unquoted attribute values (http://www.w3.org/TR/html-markup/syntax.html#syntax-attr-unquoted) and as usual provide an exact implementation to follow.

Personally I'm torn between it feeling "dirty" and it being "the most efficient implementation". What's your take @yungsters ?

@yungsters
Copy link
Contributor

I think stripping quotes can be something we look into after we nail text escaping first. I'm not too against "dirty" stuff if it's handled by a framework and done correctly. (I would also want to profile the impact of an additional check or pattern match for every attribute.)

A lot of the discussion here has been ensuring that we generate correct HTML (where correct is the specifications listed above plus existing browser behavior). I think as a framework that makes it easy to set attribute values or text content using user input, we have an obligation to also ensure that the generated HTML is safe and secure — not vulnerable to XSS.

@syranide
Copy link
Contributor Author

@yungsters Interesting and I agree with everything you said. It should be a separate PR regardless and I wouldn't mind doing the necessary work for compiling a rough best/worst-case performance/size benchmark.

Also, I fully agree and understand your point about XSS and was honestly expecting more an opposition for this PR (for that reason).

@syranide
Copy link
Contributor Author

@zpao @yungsters If you agree with the refactoring/corrections done by this PR, a stopgap solution is that I reintroduce all the current rules and you can merge it as-is. If/when your security team clears the reduced set of rules, then we just reduce the rules in the escapers. Thoughts?

@syranide
Copy link
Contributor Author

@zpao @yungsters As I "discussed" in my previous comment, I have reverted this PR to use the current escaping rules instead. It now only removes the invalid escaping of attribute names and introduces a new quoteAttributeValueForBrowser (also still renames escapeTextForBrowser).

So it seems to me that there should be nothing controversial about this PR. It enables us to easily enable narrower escaping (or omitting quotes for simple values) in the future and improves the code in general.

Attaching the narrower, now reverted, escaping functions (for posterity):

var ESCAPE_LOOKUP = {
  '&': '&amp;',
  '>': '&gt;',
  '<': '&lt;'
};

var ESCAPE_REGEX = /[&><]/g;

function escaper(match) {
  return ESCAPE_LOOKUP[match];
}

/**
 * Escapes text content to prevent scripting attacks.
 *
 * @param {*} text Text value to escape.
 * @return {string} An escaped string.
 */
function escapeTextContentForBrowser(text) {
  return ('' + text).replace(ESCAPE_REGEX, escaper);
}
var ESCAPE_LOOKUP = {
  '&': '&amp;',
  '"': '&quot;'
};

var ESCAPE_REGEX = /[&"]/g;

function escaper(match) {
  return ESCAPE_LOOKUP[match];
}

/**
 * Escapes attribute value to prevent scripting attacks.
 *
 * @param {*} text Text value to escape.
 * @return {string} An escaped string.
 */
function quoteAttributeValueForBrowser(text) {
  return '"' + ('' + text).replace(ESCAPE_REGEX, escaper) + '"';
}

@zpao
Copy link
Member

zpao commented Sep 29, 2014

@yungsters for review (though he's out so I might need to reping him in 2 weeks)

@syranide
Copy link
Contributor Author

syranide commented Feb 2, 2015

ping @yungsters, this PR is currently limited to only "splitting escapeTextForBrowser into escapeTextContentForBrowser and quoteAttributeValueForBrowser" for making the code arguably neater and also fixing some related "mistakes" (like escaping attribute names). Any objections?

invariant(
'Can only set one of `children` or `props.dangerouslySetInnerHTML`.'
);
invariant(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you revert the indentation changes here and below?

@yungsters
Copy link
Contributor

Sorry for missing this. The changes look straightforward and reasonable to me. There are some extraneous changes included in the pull request. Can you remove those?

Otherwise, this looks good to me.

@syranide
Copy link
Contributor Author

syranide commented Feb 3, 2015

@yungsters Sorry about the indentation, was caused by the rebase and I missed it, fixed. Which extraneous changes are you referring to, removal of processAttributeNameAndPrefix? I agree that it might make sense to put it in its own PR (but I kind of also think it makes sense to include it as it's affected, w/e :)), but are there any others?

EDIT: Ah, perhaps you're referring to the removal of (incorrectly) escaped attribute names too? It's kind of weird to do escapeTextContentForBrowser(name) + '=' + quoteAttributeValueForBrowser(value) as it's obviously incorrect from the naming of the functions... do you prefer to do it in a separate PR anyway? (It makes sense from a commit/history perspective)

@syranide
Copy link
Contributor Author

syranide commented Feb 3, 2015

@yungsters I removed all "extraneous changes" that I could find, give me thumbs up and I'll merge this in and put up a separate PR for those changes/fixes.


"use strict";

var escapeTextContentForBrowser = require('./escapeTextContentForBrowser');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zpao, should this just be require('escapeTextContentForBrowser')?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aaaah shit, ofc it should be.

@syranide
Copy link
Contributor Author

syranide commented Feb 3, 2015

Note to self, need to rebase and update another added escapeTextForBrowser.

@syranide
Copy link
Contributor Author

syranide commented Feb 4, 2015

My bad, I've rebased, fixed the require and license.

@yungsters
Copy link
Contributor

Looks good to me.

syranide added a commit that referenced this pull request Feb 5, 2015
Split escapeTextForBrowser into escapeTextContentForBrowser and quoteAttributeValueForBrowser
@syranide syranide merged commit 04e6d02 into facebook:master Feb 5, 2015
@syranide syranide deleted the escbrow branch February 5, 2015 08:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants