-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[SpecialCaseList] Use glob by default #74809
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-support @llvm/pr-subscribers-clang Author: Fangrui Song (MaskRay) Changeshttps://reviews.llvm.org/D154014 addes glob support and enables it when I have surveyed many ignore lists. All ignore lists I find only use There is no deprecating warning. If a user finds Link: https://discourse.llvm.org/t/use-glob-instead-of-regex-for-specialcaselists/71666 Full diff: https://github.com/llvm/llvm-project/pull/74809.diff 3 Files Affected:
diff --git a/clang/docs/SanitizerSpecialCaseList.rst b/clang/docs/SanitizerSpecialCaseList.rst
index ab39276b04395..c7fb0fa3f8a82 100644
--- a/clang/docs/SanitizerSpecialCaseList.rst
+++ b/clang/docs/SanitizerSpecialCaseList.rst
@@ -56,13 +56,18 @@ and lines starting with "#" are ignored.
.. note::
- In `D154014 <https://reviews.llvm.org/D154014>`_ we transitioned to using globs instead
- of regexes to match patterns in special case lists. Since this was a
- breaking change, we will temporarily support the original behavior using
- regexes. If ``#!special-case-list-v2`` is the first line of the file, then
- we will use the new behavior using globs. For more details, see
- `this discourse post <https://discourse.llvm.org/t/use-glob-instead-of-regex-for-specialcaselists/71666>`_.
+ Prior to Clang 18, section names and entries described below use a variant of
+ regex where ``*`` is translated to ``.*``. Clang 18 (`D154014
+ <https://reviews.llvm.org/D154014>`) switches to glob and plans to remove
+ regex support in Clang 19.
+ For Clang 18, regex is supported if ``#!special-case-list-v1`` is the first
+ line of the file.
+
+ Many special case lists use ``.`` to indicate the literal character and do
+ not use regex metacharacters such as ``(``, ``)``. They are unaffected by the
+ regex to glob transition. For more details, see `this discourse post
+ <https://discourse.llvm.org/t/use-glob-instead-of-regex-for-specialcaselists/71666>`_.
Section names are globs written in square brackets that denote
which sanitizer the following entries apply to. For example, ``[address]``
@@ -80,7 +85,6 @@ tool-specific docs.
.. code-block:: bash
- #!special-case-list-v2
# The line above is explained in the note above
# Lines starting with # are ignored.
# Turn off checks for the source file
diff --git a/llvm/lib/Support/SpecialCaseList.cpp b/llvm/lib/Support/SpecialCaseList.cpp
index ac8877cca8bc6..7a23421eaeb89 100644
--- a/llvm/lib/Support/SpecialCaseList.cpp
+++ b/llvm/lib/Support/SpecialCaseList.cpp
@@ -150,13 +150,12 @@ bool SpecialCaseList::parse(const MemoryBuffer *MB, std::string &Error) {
return false;
}
- // In https://reviews.llvm.org/D154014 we transitioned to using globs instead
- // of regexes to match patterns in special case lists. Since this was a
- // breaking change, we will temporarily support the original behavior using
- // regexes. If "#!special-case-list-v2" is the first line of the file, then
- // we will use the new behavior using globs. For more details, see
+ // In https://reviews.llvm.org/D154014 we added glob support and planned to
+ // remove regex support in patterns. We temporarily support the original
+ // behavior using regexes if "#!special-case-list-v1" is the first line of the
+ // file. For more details, see
// https://discourse.llvm.org/t/use-glob-instead-of-regex-for-specialcaselists/71666
- bool UseGlobs = MB->getBuffer().starts_with("#!special-case-list-v2\n");
+ bool UseGlobs = !MB->getBuffer().starts_with("#!special-case-list-v1\n");
for (line_iterator LineIt(*MB, /*SkipBlanks=*/true, /*CommentMarker=*/'#');
!LineIt.is_at_eof(); LineIt++) {
diff --git a/llvm/unittests/Support/SpecialCaseListTest.cpp b/llvm/unittests/Support/SpecialCaseListTest.cpp
index 81faeca5d6357..725d20a9b4def 100644
--- a/llvm/unittests/Support/SpecialCaseListTest.cpp
+++ b/llvm/unittests/Support/SpecialCaseListTest.cpp
@@ -25,8 +25,8 @@ class SpecialCaseListTest : public ::testing::Test {
std::string &Error,
bool UseGlobs = true) {
auto S = List.str();
- if (UseGlobs)
- S = (Twine("#!special-case-list-v2\n") + S).str();
+ if (!UseGlobs)
+ S = (Twine("#!special-case-list-v1\n") + S).str();
std::unique_ptr<MemoryBuffer> MB = MemoryBuffer::getMemBuffer(S);
return SpecialCaseList::create(MB.get(), Error);
}
@@ -46,8 +46,8 @@ class SpecialCaseListTest : public ::testing::Test {
SmallString<64> Path;
sys::fs::createTemporaryFile("SpecialCaseListTest", "temp", FD, Path);
raw_fd_ostream OF(FD, true, true);
- if (UseGlobs)
- OF << "#!special-case-list-v2\n";
+ if (!UseGlobs)
+ OF << "#!special-case-list-v1\n";
OF << Contents;
OF.close();
return std::string(Path.str());
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Thanks for following up!
Probably would be good to introduce the |
Since |
Right, but they wouldn't know to do that in advance of the behavior change, would they? If they do nothing/don't read any release notes, they'd get a silent behavior change, I think/if I'm understanding this correctly? |
It will be silent but the main idea is that almost all special case lists are unaffected by this transition I have checked CodeSearch (and ignorelist.txt uses in projects such as blue/v8/chromiumos) and actually haven't found any non-glob metacharacters uses ( This is understandable: Going forward, glob is the only supported format and we do not want users to annotate their ignore list with |
Still seems like an unfortunate and subtle silent change in behavior to me. But shrug if folks who own these features think it's fine, so be it. |
https://reviews.llvm.org/D154014 addes glob support and enables it when `#!special-case-list-v2` is the first line. This patch makes the glob support the default (faster than regex after https://reviews.llvm.org/D156046) and switches to the deprecated regex support if `#!special-case-list-v1` is the first line. I have surveyed many ignore lists. All ignore lists I find only use basic `*` `.` and don't use regex metacharacters such as `(` and `)`. (As neither `src:` nor `fun:` benefits from using regex.) They are unaffected by the transition (with a caution that regex `src:x/a.pb.*` matches `x/axpbx` but glob `src:x/a.pb.*` doesn't). There is no deprecating warning. If a user finds `#!special-case-list-v1`, they shall read that the old syntax is deprecated. Link: https://discourse.llvm.org/t/use-glob-instead-of-regex-for-specialcaselists/71666
e6d1325
to
a297e23
Compare
This caused some ignorelist changes, e.g.
didn't work anymore and the opt-out made it work again. Still investigating why. |
Not sure if it's the reason, but the |
the file name is |
ah it's because we something like
it seems like the new system doesn't match |
The glob mode can use the section name |
(this sort of example reinforces my concerns expressed earlier that this kind of silent change in behavior is problematic - moreso in the wild, rather than in Google's fairly constrained environment (frequent updates, good test coverage, and good bisection infrastructure, etc - other folks wouldn't have such a good time figuring this out, I suspect)) |
A report from the field: we had an ignorelist that contained I see the following issues with the transition:
|
I announced this change on discourse last year. I think others have seen this specific bug before, so maybe I should call out this case in that post. |
CC @llvm/clang-vendors
"Use glob instead of regex for SpecialCaseLists" doesn't mean anything for anyone not actively working on the relevant code. An announcement would be titled something like "Syntax change for -fsanitize-ignorelist"... and it would be posted when the change actually happened. |
Thank you for that! I don't know whether our sanitizer guys were aware of this, I've filed an internal ticket to find out. +1 to the complaint about no Release Note. |
Apologies. This could have been better handled. I've also seen a report of |
FTR, got an internal report about this. Luckily it was my turn to catch new bugs and I recognized the issue. @MaskRay Is it too late to add a Release Note for LLVM 18? |
https://reviews.llvm.org/D154014 addes glob support and enables it when
#!special-case-list-v2
is the first line. This patch makes the globsupport the default (faster than regex after
https://reviews.llvm.org/D156046) and switches to the deprecated regex
support if
#!special-case-list-v1
is the first line.I have surveyed many ignore lists. All ignore lists I find only use
basic
*
.
and don't use regex metacharacters such as(
and)
.(As neither
src:
norfun:
benefits from using regex.)They are unaffected by the transition (with a caution that regex
src:x/a.pb.*
matchesx/axpbx
but globsrc:x/a.pb.*
doesn't).There is no deprecating warning. If a user finds
#!special-case-list-v1
, they shall read that the old syntax isdeprecated.
Link: https://discourse.llvm.org/t/use-glob-instead-of-regex-for-specialcaselists/71666