Skip to content

Conversation

abhinavgaba
Copy link
Owner

@abhinavgaba abhinavgaba commented Jul 16, 2025

This is the initial clang change to support using ATTACH map-type for pointer-attachment.

This builds upon the following:

For example, for the following:

  int *p;
  #pragma omp target enter data map(p[1:10])

The following maps are now emitted by clang:

  (A)
  &p[0], &p[1], 10 * sizeof(p[1]), TO | FROM
  &p, &p[1], sizeof(p), ATTACH

Previously, the two possible maps emitted by clang were:

  (B)
  &p[0], &p[1], 10 * sizeof(p[1]), TO | FROM

  (C)
  &p, &p[1], 10 * sizeof(p[1]), TO | FROM | PTR_AND_OBJ

(B) does not perform any pointer attachment, while (C) also maps the
pointer p, both of which are incorrect.


With this change, we are using ATTACH-style maps, like (A), for cases where the expression has a base-pointer. For example:

  int *p, **pp;
  S *ps, **pps;
  ... map(p[0])
  ... map(p[10:20])
  ... map(*p)
  ... map(([20])p)
  ... map(ps->a)
  ... map(pps->p->a)
  ... map(pp[0][0])
  ... map(*(pp + 10)[0])

We also group mapping of clauses with the same base decl in the order of the increasing complexity of their base-pointers, e.g. for something like:

  S **spp;
  map(spp[0][0], spp[0][0].a), // attach-ptr: spp[0]
  map(spp[0]),                // attach-ptr: spp
  map(spp),                   // attach-ptr: N/A

We first map spp, then spp[0] then spp[0][0] and spp[0][0].a.

This allows us to also group "struct" allocation based on their attach pointers.

Cases that need handling:

  • When a class member like p is a base-pointer in a map from a member function within the same class, p is not being privatized, instead, we still try to create an implicit map of this[0:1], and access p through that, which is incorrect.
 struct S { int *p;
 void f1() {
   #pragma omp target data map(p[0:1])
      printf("%p %p\n", &p, p);
 }
  • Attach-style maps for declare mappers. That should be a separate PR.
  • use_device_addr clause does not work properly, because we don't have a proper component-list set-up for it, just one component, so we cannot find the proper attach-ptr. For use_device_addr, we should match existing maps whose attach-ptr matches the attach-ptr of the use_device_addr operand.
  • use_device_ptr handling has some issues too. Need debugging.
  • Other issues that haven't been found yet.

Some tests still haven't been updated. These include:

  Clang :: OpenMP/copy-gaps-1.cpp
  Clang :: OpenMP/copy-gaps-6.cpp
  Clang :: OpenMP/map_struct_ordering.cpp
  Clang :: OpenMP/target_data_use_device_addr_codegen.cpp
  Clang :: OpenMP/target_data_use_device_ptr_codegen.cpp
  Clang :: OpenMP/target_enter_data_codegen.cpp
  Clang :: OpenMP/target_enter_data_depend_codegen.cpp
  Clang :: OpenMP/target_exit_data_codegen.cpp
  Clang :: OpenMP/target_exit_data_depend_codegen.cpp
  Clang :: OpenMP/target_map_codegen_18c.cpp
  Clang :: OpenMP/target_map_codegen_18d.cpp
  Clang :: OpenMP/target_map_codegen_28.cpp
  Clang :: OpenMP/target_map_codegen_29.cpp
  Clang :: OpenMP/target_map_codegen_31.cpp
  Clang :: OpenMP/target_map_codegen_hold.cpp
  Clang :: OpenMP/target_map_deref_array_codegen.cpp
  Clang :: OpenMP/target_map_member_expr_codegen.cpp
  Clang :: OpenMP/target_update_codegen.cpp
  Clang :: OpenMP/target_update_depend_codegen.cpp

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The libomptarget code will disappear from this PR once llvm#149036 is merged.

const ValueDecl *BaseDecl = nullptr, const Expr *MapExpr = nullptr,
ArrayRef<OMPClauseMappableExprCommon::MappableExprComponentListRef>
OverlappedElements = {},
bool AreBothBasePtrAndPteeMapped = false) const {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AreBothBaseptrAndPteeMapped was used to decide to use PTR_AND_OBJ maps for something like map(p, p[0]). We don't do that now, since we map them independently, and attach them separately.

steakhal and others added 10 commits October 3, 2025 16:24
…lvm#161666)

This flag is really convinient in most cases.
It's easy to figure out what value to pass for most cases. However, it
can sometimes match too many times, like for template functions that has
non-decuded (aka. explicitly specified) template parameters - because
they don't appear in the parameter list, thus they are not accounted for
in the current logic.

It would be nice to improve `getFunctionName` but I'd say to just settle
on using USRs. So this PR enables passing USRs to the flag, while
keeping previous behavior.
Also remove the gratuitous spaces after "," that break strict CSV
compliance.

As the debug function name does not uniquely identify an entry point,
add the main-TU name and the USR values for each entry point snapshot to
reduce the likelihood of collisions between declarations across large
projects.

While adding a filename to each row increases the file size
substantially, the difference in size for the compressed is acceptable.

I evaluated it on our set of 200+ open source C and C++ projects with 3M
entry points, and got the following results when adding these two
columns:
- Raw CSV file increased from 530MB to 1.1GB
- Compressed file (XZ) increased from 54 MB to 78 MB

--

CPP-7098

---------

Co-authored-by: Balazs Benics <[email protected]>
…checkers (llvm#161664)

Previously, when using `-analyze-function` to target a specific
function, the analyzer would incorrectly report "Every top-level
function was skipped" even when the function was successfully analyzed
by syntax-only checkers.

This happened because `NumFunctionsAnalyzed` only counted path-sensitive
analysis, not syntax-only analysis. The misuse detection logic would see
0 functions analyzed and incorrectly conclude the function wasn't found.
…atch legalized types (llvm#161802)

When running with `-debug`, print a note when the replacement types
(during a `ConversionPatternRewriter::replaceOp`) do not match the
legalized types of the current type converter. That's not an API
violation, but it could indicate a bug in user code.

Example output:
```
[dialect-conversion:1]     ** Replace : 'test.multiple_1_to_n_replacement'(0x56b745f99470)
[dialect-conversion:1]        Note: Replacing op result of type f16 with value(s) of type (f16, f16), but the legalized type(s) is/are (f16)
```
… msvc (llvm#161811)

Clang 20 (and early 21 versions; let's hope it can be fixed before the
later versions before such versions become relevant for libcxx CI) have
got an issue with its intrinsics headers, where they use unreserved
names, that users are allowed to override.

See llvm#161808 for the issue
report.

This only crops up in the MSVC build configurations, as recent versions
of some MSVC/UCRT headers include `<intrin.h>`, which ends up pulling in
most intrinsics headers, exposing this issue in the Clang headers.

This should unblock llvm#161736
from being merged.
The i16/i32 shuffle intrinsics (`pshufw`, `pshuflw`, `pshufhw`,
`pshufd`) currently cannot be used in constant expressions. This patch
adds support in both bytecode interpreter (InterpBuiltin.cpp) and
constant evaluator
(ExprConstant.cpp) for pshuf intrinsics, enabling their use in constant
expressions.
## Intrinsics covered
- `_mm_shuffle_pi16` (MMX `pshufw`)
- `_mm_shufflelo_epi16` / `_mm_shufflehi_epi16`
- `_mm_shuffle_epi32`
- Their AVX2/AVX512 vector-width variants
- Masked and maskz forms (handled indirectly via
`__builtin_ia32_select*`)

Fixes llvm#156611
Replace mul and mul_u ops with a neg operation if their second operand
is a splat value -1.

Apply the optimization also for mul_u ops if their first operand is a
splat value -1 due to their commutativity.
```
/home/david.spickett/llvm-project/llvm/lib/Analysis/HashRecognize.cpp:100:54: warning: comparison of integers of different signs:
'typename iterator_traits<ilist_iterator_w_bits<node_options<Instruction, true, false, void, true, BasicBlock>, false, false>>::difference_type' (aka 'int') and 'size_type' (aka 'unsigned int') [-Wsign-compare]
  100 |   return std::distance(Latch->begin(), Latch->end()) != Visited.size();
      |          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^  ~~~~~~~~~~~~~~
```

By using Latch->size() instead.
airpfei and others added 29 commits October 6, 2025 12:51
1. Fixed Android setjmp issue. The root cause is that TSan initializes
before longjmp_xor_key is set up. During __libc_init_vdso, a call to
strcmp triggers TSan initialization, which occurs before
__libc_init_setjmp_cookie. The solution is to call
InitializeLongjmpXorKey on the first use of longjmp_xor_key.
Additionally, correct LONG_JMP_SP_ENV_SLOT by following the bionic
source code.
2. Skip thr object range check on Android. On Android, thr is allocated
on the heap, causing the check to fail.
3. Disable intercepting clone on Android. pthread_create internally
calls clone. Disabling the interception of clone resolves the issue in
most scenarios.
4. Use a workaround to recover the thr pointer stored in
TLS_SLOT_SANITIZER slot, whose value was modified by Skia.

This PR solved the issue from NDK
android/ndk#1041.

Test project: https://github.com/bytedance/android_tsan_sample/
…pr (llvm#162114)

This was missed in the earlier f16c/fp16 constexpr patches
…lvm#162042)

We don't have FPR isel patterns for G_LOAD/STORE so force to the GPR
register bank.
Fix the build configuration that has OnDiskCAS disabled.
…compilations. (llvm#161486)

The driver-generated -cc1 command-lines for C++ named module inputs
introduce some command-line options which affect the canonical module
build command (and therefore the context hash).
This resets those options.
…ind (llvm#161230)

Inspired by this comment
llvm#157930 (comment)
(and long-standing issues related to finding nanobind/pybind in the
right place), this PR moves to using `FetchContent_Declare` to get the
nanobind dependency. This is pretty standard (see e.g.,
[IREE](https://github.com/iree-org/iree/blob/cf60359b7443b0e92e15fb6ffc011525dc40e772/CMakeLists.txt#L842-L848)).
This PR also removes pybind which has been deprecated for almost a year
(llvm#117922) and which isn't
compatible (for whatever reason) with `FetchContent_Declare`.

---------

Co-authored-by: Jacques Pienaar <[email protected]>
…untime_vers (llvm#162062)

`DW_AT_APPLE_major_runtime_version` doesn't exist. This should be
`DW_AT_APPLE_major_runtime_vers`.
…vm#154656)

As discussed before, this PR adds the basic infrastructure/boiler plate
for loop ordering strategies to be implemented.

If this looks ok, I wanted to also mention some of the heuristics that I
would implement next, if they sound reasonable to you guys:
- Parallel first : prioritize parallel loops over reduction loops
- Dense outer : prioritize the most dense loops first
- Sparse outer : the opposite, potentially useful in some cases?

There is another that I am considering, stride/memory aware, which would
prioritize loops with better stride patterns (like sequential or
linear). Not sure how well this carries over to Sparse Tensor though.
Are there any ideas/heuristics that I should definitely try to
implement?

As we discussed, I will try to incrementally add heuristics. Sorry for
the delay on my end, and thank you so much for the feedback!

---------

Co-authored-by: Aart Bik <[email protected]>
…62119)

I don't think we need them after AST is destroyed.
If I am wrong we should see crashes immediately, as pointers
de-referenced unchecked.
This simplifies and cleans up the vector list initialization behavior.
This simplifies the work we do in SemaInit by just relying on SemaHLSL's
initialization list flattening. This change fixes some outstanding
limitations by supporting structure to vector initialization, but
re-introduces HLSL's limitations around overload resolution in
initializers.

---------

Co-authored-by: Helena Kotas <[email protected]>
…59667)

The frontend currently opens the path provided via
`-fprofile-instrument-use-path=` to learn the kind of the
instrumentation data and set the `CodeGenOptions::ProfileUse` value.
This happens during command-line parsing, where we don't have a
correctly configured VFS yet, so the behavior is quite different from
all other frontend inputs. We need to move this logic out of the
frontend command line parsing logic somewhere where we do have the
configured VFS.

The complication is that the `ProfileUse` flag is being used to set
preprocessor macros, and there isn't a great place between command line
parsing and preprocessor initialization to perform this logic.

This PR solves the issue by deducing the kind of instrumentation data
right in the driver and then passing it via a new flag to the frontend.
This shouldn't change observable behavior of Clang on the driver level,
and only affects the frontend command line interface, which is an
implementation detail anyway.
Summary:
This check is unnecessarily restrictive and currently incorrectly fires
for any size less than eight bytes. Just remove it, we do sanity checks
elsewhere and at some point need to trust the ABI.
… Android (llvm#161596)

In Bionic, pthread_detach calls pthread_join, so the thread has already
been consumed by the pthread_detach interceptor.


https://android.googlesource.com/platform/bionic/+/refs/heads/android16-release/libc/bionic/pthread_detach.cpp
AMDGPU documentation states op_sel[3] can be used in v_interp_p2_f16 in
GFX9.
llvm#162100)

… type.

Our modeling of cast expressions now results in the potential presence
of fields from other types derived from the SourceLocation's type.

Also add some additional debug logging and new tests, and update an
existing test to reflect real usage of synthetic field callbacks.
…oleans (llvm#161293)

Logical booleans in LLVM are represented by select statements - e.g. the
statement

```
A && B
```

is represented as

```
select i1 %A, i1 %B, i1 false
```

When LLVM folds two of the same logical booleans into a logical boolean
and a bitwise boolean (e.g. `A && B && C` -> `A && (B & C)`), the first
logical boolean is a select statement that retains the original
condition from the first logical boolean of the original statement. This
means that the new select statement has the branch weights as the
original select statement.

Tracking issue: llvm#147390
Originally, DXContainer `isValid...` functions were defined din the
header. That cause a problem, that made this specific file huge, since
every function also includes a `def` file. To fix this problem, this PR
moves the function to the cpp equivalent file.

Closes: [158162](llvm#158162)
Implement support for `subcommands` in OptTable to attain feature parity
with `cl`.

Design overview:
https://discourse.llvm.org/t/subcommand-feature-support-in-llvm-opttable/88098

Issue: llvm#108307
…61747)

Handle branch weights for `indirectbr`​ simplification: if we drop branches that aren't taken, we just need to remove the corresponding branch weight (which is presumably 0).

Issue llvm#147390
This allows to define multiple interface methods with the same name but
different arguments.
…rom pointer to reference. (llvm#161785)

Also adds an instance of `AttachPtrExprComparator` to the
`MappableExprHandler` class, so that it can be reused for multiple
comparisons.

This was extracted out of llvm#153683 to make that PR more focused on the
functional changes.
…et.h. (llvm#161791)

The clang changes that use this map-type are in llvm#153683.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.