-
Notifications
You must be signed in to change notification settings - Fork 6.1k
Scanner: Generate error on inbalanced RLO/LRO/PDF override markers. #10326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
db2814c
to
b43c136
Compare
This comment has been minimized.
This comment has been minimized.
b43c136
to
66c8a1d
Compare
This comment has been minimized.
This comment has been minimized.
66c8a1d
to
9d68f25
Compare
This comment has been minimized.
This comment has been minimized.
9d68f25
to
4b113d1
Compare
ee33371
to
fe19966
Compare
fe19966
to
b666df5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not finished yet but the call is coming up so here's the first batch.
Also, some general remarks:
- Isn't this a breaking change?
- What about PDI and LRI/RLI? Do we want to handle these too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rest of my review.
test/libsolidity/syntaxTests/comments/multiline_unicode_direction_override_1.sol
Outdated
Show resolved
Hide resolved
test/libsolidity/syntaxTests/comments/multiline_unicode_direction_override_1.sol
Outdated
Show resolved
Hide resolved
test/libsolidity/syntaxTests/comments/singleline_unicode_direction_override_6.sol
Outdated
Show resolved
Hide resolved
400f27a
to
8f428cf
Compare
There was an error when running
Please check that your changes are working as intended. |
cce652e
to
54e215f
Compare
There was an error when running
Please check that your changes are working as intended. |
54e215f
to
cfee4a3
Compare
There was an error when running
Please check that your changes are working as intended. |
cfee4a3
to
89d09cd
Compare
There was an error when running
Please check that your changes are working as intended. |
8ba0d9f
to
01a465c
Compare
let s := unicode"underflow "; | ||
} | ||
// ---- | ||
// ParserError 1856: (35-47): Literal or identifier expected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the right error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like AsmParser
doesn't accept unicode directional sequences.
It fails as currentToken()
returns 'Token::Illegal'. So, I would say it is fine for now, so it is not even possible to get this problem in yul, but we would need to decide how to proceed on this in yul...
I created issue so we can discuss this topic #10622
From #10326 (comment)
If this is true, why do we bother? Why don't we reject them for 0.8.0 and consult more with relevant people? (could have asked about language use in comments/literals on the user survey -- cc @franzihei) |
6c844e0
to
4b912aa
Compare
I added few more tests, hope it is good coverage now |
The biggest problem with this PR is basically that it's hard to know for sure so we try to be careful not to accidentally forbid something that turns out to be common. Asking people is definitely a good idea but it will not be the ultimate answer. Even if typical keyboards do not have this, some IMEs could. And AFAIK these features in Unicode are (relatively) new, especially LRI/RLI/PDI (that we don't even support here) so they could just not be in use yet. At this point I'm more or less convinced that it's rarely used in practice but this is still a conclusion supported mostly by lack of information :) Also, to be honest, just attempting to push this through forced us (at least me) to explore the topic and form an opinion. Now I think that @ekpyron's proposal from #10607 in some form is the way to go but it's a bit too late to switch to that and still get it included in 0.8.0. I think that this PR is good enough if we want to have at least something to protect against the trick from the underhanded contest. We can switch to something better thought out in 0.9.0. |
I was always on the opinion to just outright reject most of this complexity: #10254 (comment) But as discussed in #10607 unicode might be introducing new feature we can not prepare for, so there is always a risk. |
a743337
to
460ed18
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems mostly fine to me. The only remarks I have are minor text/naming tweaks (sorry for the one the changelog; seems like it's the tenth time or so you're tweaking it :P).
I wonder if a test for using that in imported file name or multisource-test file name would be useful. I guess it's still just a string literal so probably not...
liblangutil/Scanner.cpp
Outdated
pair<string_view, int>{"\xE2\x80\xAE", 1}, // U+202E (RLO - Right-to-Left Override) | ||
pair<string_view, int>{"\xE2\x80\xAA", 1}, // U+202A (LRE - Left-to-Right Embedding) | ||
pair<string_view, int>{"\xE2\x80\xAB", 1}, // U+202B (RLE - Right-to-Left Embedding) | ||
pair<string_view, int>{"\xE2\x80\xAC", -1} // PDF - Pop Directional Formatting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pair<string_view, int>{"\xE2\x80\xAC", -1} // PDF - Pop Directional Formatting | |
pair<string_view, int>{"\xE2\x80\xAC", -1} // U+202C (PDF - Pop Directional Formatting) |
WHITESPACE=$(git grep -n -I -E "^.*[[:space:]]+$" | grep -v "test/libsolidity/ASTJSON\|test/libsolidity/ASTRecoveryTests\|test/compilationTests/zeppelin/LICENSE") | ||
WHITESPACE=$(git grep -n -I -E "^.*[[:space:]]+$" | | ||
grep -v "test/libsolidity/ASTJSON\|test/libsolidity/ASTRecoveryTests\|test/compilationTests/zeppelin/LICENSE" | | ||
grep -v -E "test/libsolidity/syntaxTests/comments/unicode_direction_override_1.sol" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment saying why stuff like this is disabled always helps when you have to refactor it later.
liblangutil/Scanner.cpp
Outdated
case ScannerError::DirectionalOverrideUnderflowInComment: return "Unicode direction override underflow in comment or string literal."; | ||
case ScannerError::DirectionalOverrideMismatchInComment: return "Mismatching directional override markers in comment or string literal."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed that it's not just the error message that was mentioning only comments:
case ScannerError::DirectionalOverrideUnderflowInComment: return "Unicode direction override underflow in comment or string literal."; | |
case ScannerError::DirectionalOverrideMismatchInComment: return "Mismatching directional override markers in comment or string literal."; | |
case ScannerError::DirectionalOverrideUnderflow: return "Unicode direction override underflow in comment or string literal."; | |
case ScannerError::DirectionalOverrideMismatch: return "Mismatching directional override markers in comment or string literal."; |
@@ -0,0 +1,7 @@ | |||
contract C { | |||
function f() public pure { | |||
/* LRO LRE RLE PDF RLO PDF PDF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to point out the FDP and only then realized what's going on. It works :) Lost opportunity for O RLY though :)
@axic Personally I'm fine either way. Not including this in 0.8.0 and doing it properly for the next breaking release would be best but I'm also assuming that we really want something against that attack. This PR on its own does not get that deep into Unicode yet and is small so should be easy to revert if we want to change the direction. Most of it are tests actually. |
033ff61
to
381c63e
Compare
Implements nested counting on RLO/LRO/PDF
Refs #10254
TODO: