-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[Hashing] Use a non-deterministic seed if LLVM_ENABLE_ABI_BREAKING_CHECKS #96282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Hashing] Use a non-deterministic seed if LLVM_ENABLE_ABI_BREAKING_CHECKS #96282
Conversation
Created using spr 1.3.5-bogner [skip ci]
Created using spr 1.3.5-bogner
@llvm/pr-subscribers-llvm-support Author: Fangrui Song (MaskRay) ChangesHashing.h provides hash_value/hash_combine/hash_combine_range, which are
Users shouldn't rely on specific hash values due to potential algorithm Take the the address of a static storage duration variable as the seed A few users relying on the iteration order of Full diff: https://github.com/llvm/llvm-project/pull/96282.diff 5 Files Affected:
diff --git a/llvm/include/llvm/ADT/Hashing.h b/llvm/include/llvm/ADT/Hashing.h
index a5477362a5079..3c87894635787 100644
--- a/llvm/include/llvm/ADT/Hashing.h
+++ b/llvm/include/llvm/ADT/Hashing.h
@@ -126,23 +126,6 @@ hash_code hash_value(const std::basic_string<T> &arg);
/// Compute a hash_code for a standard string.
template <typename T> hash_code hash_value(const std::optional<T> &arg);
-/// Override the execution seed with a fixed value.
-///
-/// This hashing library uses a per-execution seed designed to change on each
-/// run with high probability in order to ensure that the hash codes are not
-/// attackable and to ensure that output which is intended to be stable does
-/// not rely on the particulars of the hash codes produced.
-///
-/// That said, there are use cases where it is important to be able to
-/// reproduce *exactly* a specific behavior. To that end, we provide a function
-/// which will forcibly set the seed to a fixed value. This must be done at the
-/// start of the program, before any hashes are computed. Also, it cannot be
-/// undone. This makes it thread-hostile and very hard to use outside of
-/// immediately on start of a simple program designed for reproducible
-/// behavior.
-void set_fixed_execution_hash_seed(uint64_t fixed_value);
-
-
// All of the implementation details of actually computing the various hash
// code values are held within this namespace. These routines are included in
// the header file mainly to allow inlining and constant propagation.
@@ -322,24 +305,18 @@ struct hash_state {
}
};
-
-/// A global, fixed seed-override variable.
-///
-/// This variable can be set using the \see llvm::set_fixed_execution_seed
-/// function. See that function for details. Do not, under any circumstances,
-/// set or read this variable.
-extern uint64_t fixed_seed_override;
-
+/// The seed is non-deterministic (address of a variable) to prevent having
+/// users depend on the particular hash values. On platforms without ASLR, this
+/// is still likely non-deterministic per build.
inline uint64_t get_execution_seed() {
- // FIXME: This needs to be a per-execution seed. This is just a placeholder
- // implementation. Switching to a per-execution seed is likely to flush out
- // instability bugs and so will happen as its own commit.
- //
- // However, if there is a fixed seed override set the first time this is
- // called, return that instead of the per-execution seed.
- const uint64_t seed_prime = 0xff51afd7ed558ccdULL;
- static uint64_t seed = fixed_seed_override ? fixed_seed_override : seed_prime;
- return seed;
+ static const char seed = 0;
+ // Work around x86-64 negative offset folding for old Clang -fno-pic
+ // https://reviews.llvm.org/D93931
+#if !defined(__clang__) || __clang_major__ > 11
+ return static_cast<uint64_t>(reinterpret_cast<uintptr_t>(&seed));
+#else
+ return 0xff51afd7ed558ccdULL;
+#endif
}
diff --git a/llvm/lib/Support/CMakeLists.txt b/llvm/lib/Support/CMakeLists.txt
index 0c69ac99f5bc6..31343df4d8b8b 100644
--- a/llvm/lib/Support/CMakeLists.txt
+++ b/llvm/lib/Support/CMakeLists.txt
@@ -187,7 +187,6 @@ add_llvm_component_library(LLVMSupport
FormatVariadic.cpp
GlobPattern.cpp
GraphWriter.cpp
- Hashing.cpp
HexagonAttributeParser.cpp
HexagonAttributes.cpp
InitLLVM.cpp
diff --git a/llvm/lib/Support/Hashing.cpp b/llvm/lib/Support/Hashing.cpp
deleted file mode 100644
index 1b20a670434f1..0000000000000
--- a/llvm/lib/Support/Hashing.cpp
+++ /dev/null
@@ -1,28 +0,0 @@
-//===-------------- lib/Support/Hashing.cpp -------------------------------===//
-//
-// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-//
-//===----------------------------------------------------------------------===//
-//
-// This file provides implementation bits for the LLVM common hashing
-// infrastructure. Documentation and most of the other information is in the
-// header file.
-//
-//===----------------------------------------------------------------------===//
-
-#include "llvm/ADT/Hashing.h"
-
-using namespace llvm;
-
-// Provide a definition and static initializer for the fixed seed. This
-// initializer should always be zero to ensure its value can never appear to be
-// non-zero, even during dynamic initialization.
-uint64_t llvm::hashing::detail::fixed_seed_override = 0;
-
-// Implement the function for forced setting of the fixed seed.
-// FIXME: Use atomic operations here so that there is no data race.
-void llvm::set_fixed_execution_hash_seed(uint64_t fixed_value) {
- hashing::detail::fixed_seed_override = fixed_value;
-}
diff --git a/llvm/unittests/ADT/HashingTest.cpp b/llvm/unittests/ADT/HashingTest.cpp
index ab13ee833ce55..c28356e229e66 100644
--- a/llvm/unittests/ADT/HashingTest.cpp
+++ b/llvm/unittests/ADT/HashingTest.cpp
@@ -235,78 +235,6 @@ TEST(HashingTest, HashCombineRangeLengthDiff) {
}
}
-TEST(HashingTest, HashCombineRangeGoldenTest) {
- struct { const char *s; uint64_t hash; } golden_data[] = {
-#if SIZE_MAX == UINT64_MAX || SIZE_MAX == UINT32_MAX
- { "a", 0xaeb6f9d5517c61f8ULL },
- { "ab", 0x7ab1edb96be496b4ULL },
- { "abc", 0xe38e60bf19c71a3fULL },
- { "abcde", 0xd24461a66de97f6eULL },
- { "abcdefgh", 0x4ef872ec411dec9dULL },
- { "abcdefghijklm", 0xe8a865539f4eadfeULL },
- { "abcdefghijklmnopqrstu", 0x261cdf85faaf4e79ULL },
- { "abcdefghijklmnopqrstuvwxyzabcdef", 0x43ba70e4198e3b2aULL },
- { "abcdefghijklmnopqrstuvwxyzabcdef"
- "abcdefghijklmnopqrstuvwxyzghijkl"
- "abcdefghijklmnopqrstuvwxyzmnopqr"
- "abcdefghijklmnopqrstuvwxyzstuvwx"
- "abcdefghijklmnopqrstuvwxyzyzabcd", 0xdcd57fb2afdf72beULL },
- { "a", 0xaeb6f9d5517c61f8ULL },
- { "aa", 0xf2b3b69a9736a1ebULL },
- { "aaa", 0xf752eb6f07b1cafeULL },
- { "aaaaa", 0x812bd21e1236954cULL },
- { "aaaaaaaa", 0xff07a2cff08ac587ULL },
- { "aaaaaaaaaaaaa", 0x84ac949d54d704ecULL },
- { "aaaaaaaaaaaaaaaaaaaaa", 0xcb2c8fb6be8f5648ULL },
- { "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", 0xcc40ab7f164091b6ULL },
- { "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
- "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
- "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
- "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
- "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", 0xc58e174c1e78ffe9ULL },
- { "z", 0x1ba160d7e8f8785cULL },
- { "zz", 0x2c5c03172f1285d7ULL },
- { "zzz", 0x9d2c4f4b507a2ac3ULL },
- { "zzzzz", 0x0f03b9031735693aULL },
- { "zzzzzzzz", 0xe674147c8582c08eULL },
- { "zzzzzzzzzzzzz", 0x3162d9fa6938db83ULL },
- { "zzzzzzzzzzzzzzzzzzzzz", 0x37b9a549e013620cULL },
- { "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz", 0x8921470aff885016ULL },
- { "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"
- "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"
- "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"
- "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"
- "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz", 0xf60fdcd9beb08441ULL },
- { "a", 0xaeb6f9d5517c61f8ULL },
- { "ab", 0x7ab1edb96be496b4ULL },
- { "aba", 0x3edb049950884d0aULL },
- { "ababa", 0x8f2de9e73a97714bULL },
- { "abababab", 0xee14a29ddf0ce54cULL },
- { "ababababababa", 0x38b3ddaada2d52b4ULL },
- { "ababababababababababa", 0xd3665364219f2b85ULL },
- { "abababababababababababababababab", 0xa75cd6afbf1bc972ULL },
- { "abababababababababababababababab"
- "abababababababababababababababab"
- "abababababababababababababababab"
- "abababababababababababababababab"
- "abababababababababababababababab", 0x840192d129f7a22bULL }
-#else
-#error This test only supports 64-bit and 32-bit systems.
-#endif
- };
- for (unsigned i = 0; i < sizeof(golden_data)/sizeof(*golden_data); ++i) {
- StringRef str = golden_data[i].s;
- hash_code hash = hash_combine_range(str.begin(), str.end());
-#if 0 // Enable this to generate paste-able text for the above structure.
- std::string member_str = "\"" + str.str() + "\",";
- fprintf(stderr, " { %-35s 0x%016llxULL },\n",
- member_str.c_str(), static_cast<uint64_t>(hash));
-#endif
- EXPECT_EQ(static_cast<size_t>(golden_data[i].hash),
- static_cast<size_t>(hash));
- }
-}
-
TEST(HashingTest, HashCombineBasicTest) {
// Hashing a sequence of homogenous types matches range hashing.
const int i1 = 42, i2 = 43, i3 = 123, i4 = 999, i5 = 0, i6 = 79;
diff --git a/llvm/utils/gn/secondary/llvm/lib/Support/BUILD.gn b/llvm/utils/gn/secondary/llvm/lib/Support/BUILD.gn
index af4a6993cbc79..d6b6b71a1e59b 100644
--- a/llvm/utils/gn/secondary/llvm/lib/Support/BUILD.gn
+++ b/llvm/utils/gn/secondary/llvm/lib/Support/BUILD.gn
@@ -90,7 +90,6 @@ static_library("Support") {
"FormattedStream.cpp",
"GlobPattern.cpp",
"GraphWriter.cpp",
- "Hashing.cpp",
"HexagonAttributeParser.cpp",
"HexagonAttributes.cpp",
"InitLLVM.cpp",
|
@llvm/pr-subscribers-llvm-adt Author: Fangrui Song (MaskRay) ChangesHashing.h provides hash_value/hash_combine/hash_combine_range, which are
Users shouldn't rely on specific hash values due to potential algorithm Take the the address of a static storage duration variable as the seed A few users relying on the iteration order of Full diff: https://github.com/llvm/llvm-project/pull/96282.diff 5 Files Affected:
diff --git a/llvm/include/llvm/ADT/Hashing.h b/llvm/include/llvm/ADT/Hashing.h
index a5477362a5079..3c87894635787 100644
--- a/llvm/include/llvm/ADT/Hashing.h
+++ b/llvm/include/llvm/ADT/Hashing.h
@@ -126,23 +126,6 @@ hash_code hash_value(const std::basic_string<T> &arg);
/// Compute a hash_code for a standard string.
template <typename T> hash_code hash_value(const std::optional<T> &arg);
-/// Override the execution seed with a fixed value.
-///
-/// This hashing library uses a per-execution seed designed to change on each
-/// run with high probability in order to ensure that the hash codes are not
-/// attackable and to ensure that output which is intended to be stable does
-/// not rely on the particulars of the hash codes produced.
-///
-/// That said, there are use cases where it is important to be able to
-/// reproduce *exactly* a specific behavior. To that end, we provide a function
-/// which will forcibly set the seed to a fixed value. This must be done at the
-/// start of the program, before any hashes are computed. Also, it cannot be
-/// undone. This makes it thread-hostile and very hard to use outside of
-/// immediately on start of a simple program designed for reproducible
-/// behavior.
-void set_fixed_execution_hash_seed(uint64_t fixed_value);
-
-
// All of the implementation details of actually computing the various hash
// code values are held within this namespace. These routines are included in
// the header file mainly to allow inlining and constant propagation.
@@ -322,24 +305,18 @@ struct hash_state {
}
};
-
-/// A global, fixed seed-override variable.
-///
-/// This variable can be set using the \see llvm::set_fixed_execution_seed
-/// function. See that function for details. Do not, under any circumstances,
-/// set or read this variable.
-extern uint64_t fixed_seed_override;
-
+/// The seed is non-deterministic (address of a variable) to prevent having
+/// users depend on the particular hash values. On platforms without ASLR, this
+/// is still likely non-deterministic per build.
inline uint64_t get_execution_seed() {
- // FIXME: This needs to be a per-execution seed. This is just a placeholder
- // implementation. Switching to a per-execution seed is likely to flush out
- // instability bugs and so will happen as its own commit.
- //
- // However, if there is a fixed seed override set the first time this is
- // called, return that instead of the per-execution seed.
- const uint64_t seed_prime = 0xff51afd7ed558ccdULL;
- static uint64_t seed = fixed_seed_override ? fixed_seed_override : seed_prime;
- return seed;
+ static const char seed = 0;
+ // Work around x86-64 negative offset folding for old Clang -fno-pic
+ // https://reviews.llvm.org/D93931
+#if !defined(__clang__) || __clang_major__ > 11
+ return static_cast<uint64_t>(reinterpret_cast<uintptr_t>(&seed));
+#else
+ return 0xff51afd7ed558ccdULL;
+#endif
}
diff --git a/llvm/lib/Support/CMakeLists.txt b/llvm/lib/Support/CMakeLists.txt
index 0c69ac99f5bc6..31343df4d8b8b 100644
--- a/llvm/lib/Support/CMakeLists.txt
+++ b/llvm/lib/Support/CMakeLists.txt
@@ -187,7 +187,6 @@ add_llvm_component_library(LLVMSupport
FormatVariadic.cpp
GlobPattern.cpp
GraphWriter.cpp
- Hashing.cpp
HexagonAttributeParser.cpp
HexagonAttributes.cpp
InitLLVM.cpp
diff --git a/llvm/lib/Support/Hashing.cpp b/llvm/lib/Support/Hashing.cpp
deleted file mode 100644
index 1b20a670434f1..0000000000000
--- a/llvm/lib/Support/Hashing.cpp
+++ /dev/null
@@ -1,28 +0,0 @@
-//===-------------- lib/Support/Hashing.cpp -------------------------------===//
-//
-// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-//
-//===----------------------------------------------------------------------===//
-//
-// This file provides implementation bits for the LLVM common hashing
-// infrastructure. Documentation and most of the other information is in the
-// header file.
-//
-//===----------------------------------------------------------------------===//
-
-#include "llvm/ADT/Hashing.h"
-
-using namespace llvm;
-
-// Provide a definition and static initializer for the fixed seed. This
-// initializer should always be zero to ensure its value can never appear to be
-// non-zero, even during dynamic initialization.
-uint64_t llvm::hashing::detail::fixed_seed_override = 0;
-
-// Implement the function for forced setting of the fixed seed.
-// FIXME: Use atomic operations here so that there is no data race.
-void llvm::set_fixed_execution_hash_seed(uint64_t fixed_value) {
- hashing::detail::fixed_seed_override = fixed_value;
-}
diff --git a/llvm/unittests/ADT/HashingTest.cpp b/llvm/unittests/ADT/HashingTest.cpp
index ab13ee833ce55..c28356e229e66 100644
--- a/llvm/unittests/ADT/HashingTest.cpp
+++ b/llvm/unittests/ADT/HashingTest.cpp
@@ -235,78 +235,6 @@ TEST(HashingTest, HashCombineRangeLengthDiff) {
}
}
-TEST(HashingTest, HashCombineRangeGoldenTest) {
- struct { const char *s; uint64_t hash; } golden_data[] = {
-#if SIZE_MAX == UINT64_MAX || SIZE_MAX == UINT32_MAX
- { "a", 0xaeb6f9d5517c61f8ULL },
- { "ab", 0x7ab1edb96be496b4ULL },
- { "abc", 0xe38e60bf19c71a3fULL },
- { "abcde", 0xd24461a66de97f6eULL },
- { "abcdefgh", 0x4ef872ec411dec9dULL },
- { "abcdefghijklm", 0xe8a865539f4eadfeULL },
- { "abcdefghijklmnopqrstu", 0x261cdf85faaf4e79ULL },
- { "abcdefghijklmnopqrstuvwxyzabcdef", 0x43ba70e4198e3b2aULL },
- { "abcdefghijklmnopqrstuvwxyzabcdef"
- "abcdefghijklmnopqrstuvwxyzghijkl"
- "abcdefghijklmnopqrstuvwxyzmnopqr"
- "abcdefghijklmnopqrstuvwxyzstuvwx"
- "abcdefghijklmnopqrstuvwxyzyzabcd", 0xdcd57fb2afdf72beULL },
- { "a", 0xaeb6f9d5517c61f8ULL },
- { "aa", 0xf2b3b69a9736a1ebULL },
- { "aaa", 0xf752eb6f07b1cafeULL },
- { "aaaaa", 0x812bd21e1236954cULL },
- { "aaaaaaaa", 0xff07a2cff08ac587ULL },
- { "aaaaaaaaaaaaa", 0x84ac949d54d704ecULL },
- { "aaaaaaaaaaaaaaaaaaaaa", 0xcb2c8fb6be8f5648ULL },
- { "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", 0xcc40ab7f164091b6ULL },
- { "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
- "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
- "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
- "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
- "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", 0xc58e174c1e78ffe9ULL },
- { "z", 0x1ba160d7e8f8785cULL },
- { "zz", 0x2c5c03172f1285d7ULL },
- { "zzz", 0x9d2c4f4b507a2ac3ULL },
- { "zzzzz", 0x0f03b9031735693aULL },
- { "zzzzzzzz", 0xe674147c8582c08eULL },
- { "zzzzzzzzzzzzz", 0x3162d9fa6938db83ULL },
- { "zzzzzzzzzzzzzzzzzzzzz", 0x37b9a549e013620cULL },
- { "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz", 0x8921470aff885016ULL },
- { "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"
- "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"
- "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"
- "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"
- "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz", 0xf60fdcd9beb08441ULL },
- { "a", 0xaeb6f9d5517c61f8ULL },
- { "ab", 0x7ab1edb96be496b4ULL },
- { "aba", 0x3edb049950884d0aULL },
- { "ababa", 0x8f2de9e73a97714bULL },
- { "abababab", 0xee14a29ddf0ce54cULL },
- { "ababababababa", 0x38b3ddaada2d52b4ULL },
- { "ababababababababababa", 0xd3665364219f2b85ULL },
- { "abababababababababababababababab", 0xa75cd6afbf1bc972ULL },
- { "abababababababababababababababab"
- "abababababababababababababababab"
- "abababababababababababababababab"
- "abababababababababababababababab"
- "abababababababababababababababab", 0x840192d129f7a22bULL }
-#else
-#error This test only supports 64-bit and 32-bit systems.
-#endif
- };
- for (unsigned i = 0; i < sizeof(golden_data)/sizeof(*golden_data); ++i) {
- StringRef str = golden_data[i].s;
- hash_code hash = hash_combine_range(str.begin(), str.end());
-#if 0 // Enable this to generate paste-able text for the above structure.
- std::string member_str = "\"" + str.str() + "\",";
- fprintf(stderr, " { %-35s 0x%016llxULL },\n",
- member_str.c_str(), static_cast<uint64_t>(hash));
-#endif
- EXPECT_EQ(static_cast<size_t>(golden_data[i].hash),
- static_cast<size_t>(hash));
- }
-}
-
TEST(HashingTest, HashCombineBasicTest) {
// Hashing a sequence of homogenous types matches range hashing.
const int i1 = 42, i2 = 43, i3 = 123, i4 = 999, i5 = 0, i6 = 79;
diff --git a/llvm/utils/gn/secondary/llvm/lib/Support/BUILD.gn b/llvm/utils/gn/secondary/llvm/lib/Support/BUILD.gn
index af4a6993cbc79..d6b6b71a1e59b 100644
--- a/llvm/utils/gn/secondary/llvm/lib/Support/BUILD.gn
+++ b/llvm/utils/gn/secondary/llvm/lib/Support/BUILD.gn
@@ -90,7 +90,6 @@ static_library("Support") {
"FormattedStream.cpp",
"GlobPattern.cpp",
"GraphWriter.cpp",
- "Hashing.cpp",
"HexagonAttributeParser.cpp",
"HexagonAttributes.cpp",
"InitLLVM.cpp",
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea, but I'm not sure this should be always enabled. This sounds like a lot of debugging fun when a seed-dependent case is not caught in testing.
Can we have this part of LLVM_REVERSE_ITERATION instead, which is used to detect similar hash iteration stability issues? We have a build bot for it.
I have fixed a lot of non-determinism bugs in the past few years. Potential non-determinism concerns are outweighed by the benefits.
While -DLLVM_ENABLE_REVERSE_ITERATION=on helps, it adds complexity for bot maintainers and configurations. BTW, LLVM_ENABLE_REVERSE_ITERATION currently only applies to pointer-like types.
|
We restricted reverse-iteration when we added it just to save time when we were enabling it: we wanted to prioritize issues that were actually likely to cause non-determinism (as opposed to relying on the hash algorithm, which is annoying but not actually non-deterministic). If you're willing to fix all the resulting breakage, it should be fine to apply it more widely. I'm a little concerned that doing this in release builds is going to lead to weird bug reports. Especially given the current approach for getting randomness: ASLR isn't really that random, particularly on Windows, so the probability of getting a particular seed isn't uniform. |
Due to ASLR and malloc non-determinism,
I understand the concern. If we receive 10 weird bug reports root caused to The next step is to make integer type keys (e.g. As my previous comment says:
but I haven't found it necessary. Due to Avalanche effects, even a few ASLR bits are sufficient to cover many different scenarios and expose latent bugs. |
Created using spr 1.3.5-bogner [skip ci]
Created using spr 1.3.5-bogner
Created using spr 1.3.5-bogner [skip ci]
Created using spr 1.3.5-bogner
The non-determinism is now restricted to LLVM_ENABLE_ABI_BREAKING_CHECKS defaults to Note: llvm/include/llvm/ADT/STLExtras.h and llvm/include/llvm/Support/Error.h also use |
llvm/include/llvm/ADT/Hashing.h
Outdated
const uint64_t seed_prime = 0xff51afd7ed558ccdULL; | ||
static uint64_t seed = fixed_seed_override ? fixed_seed_override : seed_prime; | ||
return seed; | ||
[[maybe_unused]] static const char seed = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this inside the #if
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but please wait a bit in case there is more feedback.
Created using spr 1.3.5-bogner
There is significant improvement on compile time through the stage2-O3:
This is great, but it also suggests that replacing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm happier restricting the non-determinism to +Asserts for now, at least as an incremental step.
Due to Avalanche effects, even a few ASLR bits are sufficient to cover many different scenarios and expose latent bugs.
On Windows specifically, I'm less concerned about the total number of bits, and more concerned that ASLR isn't randomized for each run of an executable.
// Work around x86-64 negative offset folding for old Clang -fno-pic | ||
// https://reviews.llvm.org/D93931 | ||
#if LLVM_ENABLE_ABI_BREAKING_CHECKS && \ | ||
(!defined(__clang__) || __clang_major__ > 11) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it an ABI problem that this ifdef exists? I mean, LLVM libraries built with clang<11 can't be used by programs built with clang>11. With LLVM_ENABLE_ABI_BREAKING_CHECKS, I guess it's unlikely to cause issues, though. (I guess you could use an empty inline asm as a workaround if you wanted to.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The clang condition introduced a slight ABI problem when mixing clang<11
and clang>=11
. In practice it is rare that the llvm-project build and a downstream client exchange the hash values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this affect any access to a DenseMap across the boundary for example?
That is any API where LLVM would initialize a DenseMap and return a reference to it to the user which then access elements by computing key hashes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is definitely a problem for LLVM built using GCC and an external project, based on LLVM, built using CLANG 11. Is there a reliable check that we can add (e.g. in llvm/include/llvm/Config/abi-breaking.h.cmake
) to detect this at the build time rather than failing during the execution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LLVM built with GCC are typically release builds and LLVM_ENABLE_ABI_BREAKING_CHECKS is 0. The else
branch is taken and there is no impact.
If we want to be over-cautious, this condition can be refined to:
LLVM_ENABLE_ABI_BREAKING_CHECKS && (!defined(__clang__) || __clang_major__ > 11 || defined(__PIC__))
.
We could code the condition to utils/bazel/llvm_configs/abi-breaking.h.cmake
, but I am hoping that the complexity isn't needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would make sense to move the function to the source file for the LLVM_ENABLE_ABI_BREAKING_CHECKS case? It will add cost, but it's okay for assertion-enabled builds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my case it is a Release
build with GCC and -DLLVM_ENABLE_ASSERTIONS=ON
.
Could you please explain how __PIC__
check helps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are still dealing with the fallout from this change in Triton.
It seems that breaking the ABI across different compilers is acceptable, but it would be good to detect this and error out. We already detect mismatches of LLVM_ENABLE_ABI_BREAKING_CHECKS
. Would it be acceptable to just add the clang > 11
condition there?
Eli is happier now. I plan to land this in a few hours. |
Otherwise clang/test/CodeGenCXX/attr-annotate.cpp output could fail when llvm::hash_value(StringRef) changes (#96282). EmitGlobalAnnotations iterates over DeferredAnnotations.
clangd/test/trace.test might fail as llvm::hash_value(StringRef) is non-deterministic per process (llvm#96282).
Follow-up to llvm#96282. Since llvm/lib/Target files are compiled with -fvisibility=hidden, the seed variable in libLLVM.so is hidden in a -DLLVM_BUILD_LLVM_DYLIB=on build. This would cause `hash_value(std::string()), hash_value(StringRef())` to fail since the former (might be part of the main executable) and the latter (llvm/lib/Support/StringRef.cpp in libLLVM.so) use copies in different components.
This test from llvm#74253 relies on particular hash values from Hashing.h. The test fails in LLVM_ENABLE_ABI_BREAKING_CHECKS=on modes (llvm#96282) or whenever Hashing.h implementation changes.
This test from llvm#74253 relies on particular hash values from Hashing.h. The test fails in LLVM_ENABLE_ABI_BREAKING_CHECKS=on modes (llvm#96282) or whenever Hashing.h implementation changes.
Otherwise clang/test/CodeGenCXX/attr-annotate.cpp output could fail when llvm::hash_value(StringRef) changes (llvm#96282). EmitGlobalAnnotations iterates over DeferredAnnotations. (cherry picked from commit 6644975)
Aren't hashes supposed to be deterministic ? Can we add a override for builds with asserts turned ON to favor determinism ? Could have an API to set the determinism flag ? This also has an effect on Build ID computation ? |
Some hashes are supposed to be deterministic, but If you want to deploy assertions builds and have concern about non-deterministism, set LLVM_ABI_BREAKING_CHECKS to The unstable property is important to ensure that no test relies on the particular implementation and allow us to replace the implementation, which would be infeasible if people keep adding brittle tests. |
After llvm/llvm-project#96282, the order of the attr changed, so update the test .
…of attr llvm/llvm-project#96282 changed the hashing, we may get non-detrministic iteration order if using DenseMap. This is causing tests to beome flaky. Fixing it similar to 6644975
llvm/llvm-project#96282 changed `get_execution_seed` to be non-deterministic. Use stable hashes instead.
llvm/llvm-project#96282 changed `get_execution_seed` to be non-deterministic. Use stable hashes instead.
llvm/llvm-project#96282 changed `get_execution_seed` to be non-deterministic. Use stable hashes instead.
This commit is fixing the root cause of the issue that appeared during an llvm update, and was raised in triton-lang#4212 (comment). The issue was temporary fixed in triton-lang#4512, where we set LLVM_ABI_BREAKING_CHECKS to FORCE_OFF constant in order to not use a non-deterministic seed inside the hashing function (included in this llvm commit: llvm/llvm-project@ce80c80). A further investigation (with chsigg@) found that the issue is that LLVM is built with clang11, while Triton use a newer version. The ABI issue is brought up here: llvm/llvm-project#96282 (comment), but the consensus seemed to be that this setup is rare. Updated the clang version to 12 in the ubuntu build fixed the issue and therefore we can revert setting LLVM_ABI_BREAKING_CHECKS constant. I am additionaly erasing LLVM_ABI_BREAKING_CHECKS in the setup of the other hardwares (it seems it was not needed) and I am splitting ubuntu and macOS configurations because it seems cleaner than having a variable that sets the compiler version for each of them.
This commit is fixing the root cause of the issue that appeared during an llvm update, and was raised in triton-lang#4212 (comment). The issue was temporary fixed in triton-lang#4512, where we set LLVM_ABI_BREAKING_CHECKS to FORCE_OFF constant in order to not use a non-deterministic seed inside the hashing function (included in this llvm commit: llvm/llvm-project@ce80c80). A further investigation (with chsigg@) found that the issue is that LLVM is built with clang11, while Triton use a newer version. The ABI issue is brought up here: llvm/llvm-project#96282 (comment), but the consensus seemed to be that this setup is rare. Updated the clang version to 12 in the ubuntu build fixed the issue and therefore we can revert setting LLVM_ABI_BREAKING_CHECKS constant. I am additionaly erasing LLVM_ABI_BREAKING_CHECKS in the setup of the other hardwares (it seems it was not needed) and I am splitting ubuntu and macOS configurations because it seems cleaner than having a variable that sets the compiler version for each of them.
This commit is fixing the root cause of the issue that appeared during an llvm update, and was raised in #4212 (comment). The issue was temporary fixed in #4512, where we set LLVM_ABI_BREAKING_CHECKS to FORCE_OFF constant in order to not use a non-deterministic seed inside the hashing function (included in this llvm commit: llvm/llvm-project@ce80c80). A further investigation (with chsigg@) found that the issue is that LLVM is built with clang11, while Triton use a newer version. The ABI issue is brought up here: llvm/llvm-project#96282 (comment), but the consensus seemed to be that this setup is rare. Updated the clang version to 12 in the ubuntu build fixed the issue and therefore we can revert setting LLVM_ABI_BREAKING_CHECKS constant. I am additionaly erasing LLVM_ABI_BREAKING_CHECKS in the setup of the other hardwares (it seems it was not needed) and I am splitting ubuntu and macOS configurations because it seems cleaner than having a variable that sets the compiler version for each of them.
FNV, used by stable_hash_combine_string is extremely slow. For string hashing with good avalanche effects, we prefer xxh3_64bits. StableHashing.h might still be useful as it provides a stable hash_combine while Hashing.h's might be non-deterministic (llvm#96282). Pull Request: llvm#100668 (cherry picked from commit c538434)
Hashing.h provides hash_value/hash_combine/hash_combine_range, which are
primarily used by
DenseMap<StringRef, X>
Users shouldn't rely on specific hash values due to size_t differences
on 32-bit/64-bit platforms and potential algorithm changes.
set_fixed_execution_hash_seed
is provided but it has never been used.In LLVM_ENABLE_ABI_BREAKING_CHECKS builds, take the the address of a
static storage duration variable as the seed like
absl/hash/internal/hash.h
kSeed
. (See https://reviews.llvm.org/D93931for workaround for older Clang. Mach-O x86-64 forces PIC, so absl's
__apple_build_version__
check is unnecessary.)LLVM_ENABLE_ABI_BREAKING_CHECKS defaults to
WITH_ASSERTS
and isenabled in an assertion build.
In a non-assertion build,
get_execution_seed
returns the fixed valueregardless of
NDEBUG
. Removing a variable load yields noticeablesize/performance improvement.
A few users relying on the iteration order of
DenseMap<StringRef, X>
have been fixed (e.g., f8f4235 c025bd1 89e8e63
86eb6bf eb8d036 0ea6b8e 58d7a6e 8ea31db
592abf2 6644975).
From my experience fixing
StringMap
iteration order issues, the scale of issues is similar.