-
Notifications
You must be signed in to change notification settings - Fork 0
Use ATTACH maps for array-sections/subscripts on pointers. #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: tgt-capture-mapped-ptrs-by-ref
Are you sure you want to change the base?
Use ATTACH maps for array-sections/subscripts on pointers. #1
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The libomptarget code will disappear from this PR once llvm#149036 is merged.
const ValueDecl *BaseDecl = nullptr, const Expr *MapExpr = nullptr, | ||
ArrayRef<OMPClauseMappableExprCommon::MappableExprComponentListRef> | ||
OverlappedElements = {}, | ||
bool AreBothBasePtrAndPteeMapped = false) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AreBothBaseptrAndPteeMapped was used to decide to use PTR_AND_OBJ maps for something like map(p, p[0])
. We don't do that now, since we map them independently, and attach them separately.
…lvm#161666) This flag is really convinient in most cases. It's easy to figure out what value to pass for most cases. However, it can sometimes match too many times, like for template functions that has non-decuded (aka. explicitly specified) template parameters - because they don't appear in the parameter list, thus they are not accounted for in the current logic. It would be nice to improve `getFunctionName` but I'd say to just settle on using USRs. So this PR enables passing USRs to the flag, while keeping previous behavior.
Also remove the gratuitous spaces after "," that break strict CSV compliance. As the debug function name does not uniquely identify an entry point, add the main-TU name and the USR values for each entry point snapshot to reduce the likelihood of collisions between declarations across large projects. While adding a filename to each row increases the file size substantially, the difference in size for the compressed is acceptable. I evaluated it on our set of 200+ open source C and C++ projects with 3M entry points, and got the following results when adding these two columns: - Raw CSV file increased from 530MB to 1.1GB - Compressed file (XZ) increased from 54 MB to 78 MB -- CPP-7098 --------- Co-authored-by: Balazs Benics <[email protected]>
…checkers (llvm#161664) Previously, when using `-analyze-function` to target a specific function, the analyzer would incorrectly report "Every top-level function was skipped" even when the function was successfully analyzed by syntax-only checkers. This happened because `NumFunctionsAnalyzed` only counted path-sensitive analysis, not syntax-only analysis. The misuse detection logic would see 0 functions analyzed and incorrectly conclude the function wasn't found.
revert this patch due to failure in unittests/Support, e.g. https://lab.llvm.org/buildbot/#/builders/33/builds/24178/steps/6/logs/FAIL__LLVM-Unit__SupportTests_61 This reverts commit ffc503e.
…atch legalized types (llvm#161802) When running with `-debug`, print a note when the replacement types (during a `ConversionPatternRewriter::replaceOp`) do not match the legalized types of the current type converter. That's not an API violation, but it could indicate a bug in user code. Example output: ``` [dialect-conversion:1] ** Replace : 'test.multiple_1_to_n_replacement'(0x56b745f99470) [dialect-conversion:1] Note: Replacing op result of type f16 with value(s) of type (f16, f16), but the legalized type(s) is/are (f16) ```
… msvc (llvm#161811) Clang 20 (and early 21 versions; let's hope it can be fixed before the later versions before such versions become relevant for libcxx CI) have got an issue with its intrinsics headers, where they use unreserved names, that users are allowed to override. See llvm#161808 for the issue report. This only crops up in the MSVC build configurations, as recent versions of some MSVC/UCRT headers include `<intrin.h>`, which ends up pulling in most intrinsics headers, exposing this issue in the Clang headers. This should unblock llvm#161736 from being merged.
The i16/i32 shuffle intrinsics (`pshufw`, `pshuflw`, `pshufhw`, `pshufd`) currently cannot be used in constant expressions. This patch adds support in both bytecode interpreter (InterpBuiltin.cpp) and constant evaluator (ExprConstant.cpp) for pshuf intrinsics, enabling their use in constant expressions. ## Intrinsics covered - `_mm_shuffle_pi16` (MMX `pshufw`) - `_mm_shufflelo_epi16` / `_mm_shufflehi_epi16` - `_mm_shuffle_epi32` - Their AVX2/AVX512 vector-width variants - Masked and maskz forms (handled indirectly via `__builtin_ia32_select*`) Fixes llvm#156611
Replace mul and mul_u ops with a neg operation if their second operand is a splat value -1. Apply the optimization also for mul_u ops if their first operand is a splat value -1 due to their commutativity.
``` /home/david.spickett/llvm-project/llvm/lib/Analysis/HashRecognize.cpp:100:54: warning: comparison of integers of different signs: 'typename iterator_traits<ilist_iterator_w_bits<node_options<Instruction, true, false, void, true, BasicBlock>, false, false>>::difference_type' (aka 'int') and 'size_type' (aka 'unsigned int') [-Wsign-compare] 100 | return std::distance(Latch->begin(), Latch->end()) != Visited.size(); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~ ``` By using Latch->size() instead.
1. Fixed Android setjmp issue. The root cause is that TSan initializes before longjmp_xor_key is set up. During __libc_init_vdso, a call to strcmp triggers TSan initialization, which occurs before __libc_init_setjmp_cookie. The solution is to call InitializeLongjmpXorKey on the first use of longjmp_xor_key. Additionally, correct LONG_JMP_SP_ENV_SLOT by following the bionic source code. 2. Skip thr object range check on Android. On Android, thr is allocated on the heap, causing the check to fail. 3. Disable intercepting clone on Android. pthread_create internally calls clone. Disabling the interception of clone resolves the issue in most scenarios. 4. Use a workaround to recover the thr pointer stored in TLS_SLOT_SANITIZER slot, whose value was modified by Skia. This PR solved the issue from NDK android/ndk#1041. Test project: https://github.com/bytedance/android_tsan_sample/
…pr (llvm#162114) This was missed in the earlier f16c/fp16 constexpr patches
…lvm#162042) We don't have FPR isel patterns for G_LOAD/STORE so force to the GPR register bank.
Fix the build configuration that has OnDiskCAS disabled.
…compilations. (llvm#161486) The driver-generated -cc1 command-lines for C++ named module inputs introduce some command-line options which affect the canonical module build command (and therefore the context hash). This resets those options.
…ind (llvm#161230) Inspired by this comment llvm#157930 (comment) (and long-standing issues related to finding nanobind/pybind in the right place), this PR moves to using `FetchContent_Declare` to get the nanobind dependency. This is pretty standard (see e.g., [IREE](https://github.com/iree-org/iree/blob/cf60359b7443b0e92e15fb6ffc011525dc40e772/CMakeLists.txt#L842-L848)). This PR also removes pybind which has been deprecated for almost a year (llvm#117922) and which isn't compatible (for whatever reason) with `FetchContent_Declare`. --------- Co-authored-by: Jacques Pienaar <[email protected]>
…untime_vers (llvm#162062) `DW_AT_APPLE_major_runtime_version` doesn't exist. This should be `DW_AT_APPLE_major_runtime_vers`.
…vm#154656) As discussed before, this PR adds the basic infrastructure/boiler plate for loop ordering strategies to be implemented. If this looks ok, I wanted to also mention some of the heuristics that I would implement next, if they sound reasonable to you guys: - Parallel first : prioritize parallel loops over reduction loops - Dense outer : prioritize the most dense loops first - Sparse outer : the opposite, potentially useful in some cases? There is another that I am considering, stride/memory aware, which would prioritize loops with better stride patterns (like sequential or linear). Not sure how well this carries over to Sparse Tensor though. Are there any ideas/heuristics that I should definitely try to implement? As we discussed, I will try to incrementally add heuristics. Sorry for the delay on my end, and thank you so much for the feedback! --------- Co-authored-by: Aart Bik <[email protected]>
…62119) I don't think we need them after AST is destroyed. If I am wrong we should see crashes immediately, as pointers de-referenced unchecked.
This simplifies and cleans up the vector list initialization behavior. This simplifies the work we do in SemaInit by just relying on SemaHLSL's initialization list flattening. This change fixes some outstanding limitations by supporting structure to vector initialization, but re-introduces HLSL's limitations around overload resolution in initializers. --------- Co-authored-by: Helena Kotas <[email protected]>
…59667) The frontend currently opens the path provided via `-fprofile-instrument-use-path=` to learn the kind of the instrumentation data and set the `CodeGenOptions::ProfileUse` value. This happens during command-line parsing, where we don't have a correctly configured VFS yet, so the behavior is quite different from all other frontend inputs. We need to move this logic out of the frontend command line parsing logic somewhere where we do have the configured VFS. The complication is that the `ProfileUse` flag is being used to set preprocessor macros, and there isn't a great place between command line parsing and preprocessor initialization to perform this logic. This PR solves the issue by deducing the kind of instrumentation data right in the driver and then passing it via a new flag to the frontend. This shouldn't change observable behavior of Clang on the driver level, and only affects the frontend command line interface, which is an implementation detail anyway.
Summary: This check is unnecessarily restrictive and currently incorrectly fires for any size less than eight bytes. Just remove it, we do sanity checks elsewhere and at some point need to trust the ABI.
… Android (llvm#161596) In Bionic, pthread_detach calls pthread_join, so the thread has already been consumed by the pthread_detach interceptor. https://android.googlesource.com/platform/bionic/+/refs/heads/android16-release/libc/bionic/pthread_detach.cpp
AMDGPU documentation states op_sel[3] can be used in v_interp_p2_f16 in GFX9.
llvm#162100) … type. Our modeling of cast expressions now results in the potential presence of fields from other types derived from the SourceLocation's type. Also add some additional debug logging and new tests, and update an existing test to reflect real usage of synthetic field callbacks.
…oleans (llvm#161293) Logical booleans in LLVM are represented by select statements - e.g. the statement ``` A && B ``` is represented as ``` select i1 %A, i1 %B, i1 false ``` When LLVM folds two of the same logical booleans into a logical boolean and a bitwise boolean (e.g. `A && B && C` -> `A && (B & C)`), the first logical boolean is a select statement that retains the original condition from the first logical boolean of the original statement. This means that the new select statement has the branch weights as the original select statement. Tracking issue: llvm#147390
Originally, DXContainer `isValid...` functions were defined din the header. That cause a problem, that made this specific file huge, since every function also includes a `def` file. To fix this problem, this PR moves the function to the cpp equivalent file. Closes: [158162](llvm#158162)
Implement support for `subcommands` in OptTable to attain feature parity with `cl`. Design overview: https://discourse.llvm.org/t/subcommand-feature-support-in-llvm-opttable/88098 Issue: llvm#108307
…61747) Handle branch weights for `indirectbr` simplification: if we drop branches that aren't taken, we just need to remove the corresponding branch weight (which is presumably 0). Issue llvm#147390
This allows to define multiple interface methods with the same name but different arguments.
…rom pointer to reference. (llvm#161785) Also adds an instance of `AttachPtrExprComparator` to the `MappableExprHandler` class, so that it can be reused for multiple comparisons. This was extracted out of llvm#153683 to make that PR more focused on the functional changes.
…et.h. (llvm#161791) The clang changes that use this map-type are in llvm#153683.
…on-using-attach-maptype
This is the initial clang change to support using
ATTACH
map-type for pointer-attachment.This builds upon the following:
target
by reference. llvm/llvm-project#145454For example, for the following:
The following maps are now emitted by clang:
Previously, the two possible maps emitted by clang were:
(B) does not perform any pointer attachment, while (C) also maps the
pointer p, both of which are incorrect.
With this change, we are using ATTACH-style maps, like
(A)
, for cases where the expression has a base-pointer. For example:We also group mapping of clauses with the same base decl in the order of the increasing complexity of their base-pointers, e.g. for something like:
We first map
spp
, thenspp[0]
then spp[0][0] and spp[0][0].a.This allows us to also group "struct" allocation based on their attach pointers.
Cases that need handling:
p
is a base-pointer in a map from a member function within the same class, p is not beingprivatized
, instead, we still try to create an implicit map ofthis[0:1]
, and accessp
through that, which is incorrect.use_device_addr
clause does not work properly, because we don't have a proper component-list set-up for it, just one component, so we cannot find the proper attach-ptr. Foruse_device_addr
, we should match existing maps whose attach-ptr matches the attach-ptr of theuse_device_addr
operand.use_device_ptr
handling has some issues too. Need debugging.Some tests still haven't been updated. These include: