[LV] Change loops' interleave count computation #70141

nilanjana87 · 2023-10-24T23:02:03Z

[LV] Change loops' interleave count computation

A set of microbenchmarks in llvm-test-suite (llvm/llvm-test-suite#26), when tested on a AArch64 platform, demonstrates that loop interleaving is beneficial in two cases:
1) when TC > 2 * VW * IC, such that the interleaved vectorized portion of the loop runs at least twice
2) when TC is an exact multiple of VW * IC, such that there is no epilogue loop to run
where, TC = trip count, VW = vectorization width, IC = interleaving count

We change the interleave count computation based on this information but we leave it the same when the flag InterleaveSmallLoopScalarReductionTrue is set to true, since it handles a special case (https://reviews.llvm.org/D81416).

llvmbot · 2023-10-24T23:03:45Z

@llvm/pr-subscribers-llvm-transforms

Author: Nilanjana Basu (nilanjana87)

Changes

[LV] Change loops' interleave count computation

A set of microbenchmarks in llvm-test-suite (llvm/llvm-test-suite#26), when tested on a AArch64 platform, demonstrates that loop interleaving is beneficial in two cases:
1) when TC > 2 * VW * IC, such that the interleaved vectorized portion of the loop runs at least twice
2) when TC is an exact multiple of VW * IC, such that there is no epilogue loop to run
where, TC = trip count, VW = vectorization width, IC = interleaving count

We change the interleave count computation based on this information but we leave it the same when the flag InterleaveSmallLoopScalarReductionTrue is set to true, since it handles a special case (https://reviews.llvm.org/D81416).

Full diff: https://github.com/llvm/llvm-project/pull/70141.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+15-6)
(added) llvm/test/Transforms/LoopVectorize/AArch64/interleave_count.ll (+106)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index cc17d91d4f43727..5786e8e38cb4974 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -5745,8 +5745,12 @@ LoopVectorizationCostModel::selectInterleaveCount(ElementCount VF,
   }
 
   // If trip count is known or estimated compile time constant, limit the
-  // interleave count to be less than the trip count divided by VF, provided it
-  // is at least 1.
+  // interleave count to be less than the trip count divided by VF * 2,
+  // provided VF is at least 1 and the trip count is not an exact multiple of
+  // VF, such that the vector loop runs at least twice to make interleaving seem
+  // profitable when there is an epilogue loop present. When
+  // InterleaveSmallLoopScalarReduction is true or trip count is an exact
+  // multiple of VF, we allow interleaving even when the vector loop runs once.
   //
   // For scalable vectors we can't know if interleaving is beneficial. It may
   // not be beneficial for small loops if none of the lanes in the second vector
@@ -5755,10 +5759,15 @@ LoopVectorizationCostModel::selectInterleaveCount(ElementCount VF,
   // the InterleaveCount as if vscale is '1', although if some information about
   // the vector is known (e.g. min vector size), we can make a better decision.
   if (BestKnownTC) {
-    MaxInterleaveCount =
-        std::min(*BestKnownTC / VF.getKnownMinValue(), MaxInterleaveCount);
-    // Make sure MaxInterleaveCount is greater than 0.
-    MaxInterleaveCount = std::max(1u, MaxInterleaveCount);
+    if (InterleaveSmallLoopScalarReduction ||
+        (*BestKnownTC % VF.getKnownMinValue() == 0))
+      MaxInterleaveCount =
+          std::min(*BestKnownTC / VF.getKnownMinValue(), MaxInterleaveCount);
+    else
+      MaxInterleaveCount = std::min(*BestKnownTC / (VF.getKnownMinValue() * 2),
+                                    MaxInterleaveCount);
+    // Make sure MaxInterleaveCount is greater than 0 & a power of 2.
+    MaxInterleaveCount = llvm::bit_floor(std::max(1u, MaxInterleaveCount));
   }
 
   assert(MaxInterleaveCount > 0 &&
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/interleave_count.ll b/llvm/test/Transforms/LoopVectorize/AArch64/interleave_count.ll
new file mode 100644
index 000000000000000..501fc334bc8bf48
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/interleave_count.ll
@@ -0,0 +1,106 @@
+; RUN: opt < %s -force-vector-width=64 -O3 -S -pass-remarks=loop-vectorize 2>&1 | FileCheck %s
+
+target triple = "aarch64-linux-gnu"
+
+%pair = type { i8, i8 }
+
+; For a loop with known trip count of 128, when we force VF 64, it should use
+; IC 2, since there is no remainder loop needed when the vector loop runs.
+; CHECK: remark: <unknown>:0:0: vectorized loop (vectorization width: 64, interleaved count: 2)
+define void @loop_with_tc_128(ptr %p, ptr %q) {
+entry:
+  br label %for.body
+
+for.body:
+  %i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
+  %tmp0 = getelementptr %pair, ptr %p, i64 %i, i32 0
+  %tmp1 = load i8, ptr %tmp0, align 1
+  %tmp2 = getelementptr %pair, ptr %p, i64 %i, i32 1
+  %tmp3 = load i8, ptr %tmp2, align 1
+  %add = add i8 %tmp1, %tmp3
+  %qi = getelementptr i8, ptr %q, i64 %i
+  store i8 %add, ptr %qi, align 1
+  %i.next = add nuw nsw i64 %i, 1
+  %cond = icmp eq i64 %i.next, 128
+  br i1 %cond, label %for.end, label %for.body
+
+for.end:
+  ret void
+}
+
+; For a loop with known trip count of 129, when we force VF 64, it should use
+; IC 1, since there may be a remainder loop that needs to run after the vector loop.
+; CHECK: remark: <unknown>:0:0: vectorized loop (vectorization width: 64, interleaved count: 1)
+define void @loop_with_tc_129(ptr %p, ptr %q) {
+entry:
+  br label %for.body
+
+for.body:
+  %i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
+  %tmp0 = getelementptr %pair, ptr %p, i64 %i, i32 0
+  %tmp1 = load i8, ptr %tmp0, align 1
+  %tmp2 = getelementptr %pair, ptr %p, i64 %i, i32 1
+  %tmp3 = load i8, ptr %tmp2, align 1
+  %add = add i8 %tmp1, %tmp3
+  %qi = getelementptr i8, ptr %q, i64 %i
+  store i8 %add, ptr %qi, align 1
+  %i.next = add nuw nsw i64 %i, 1
+  %cond = icmp eq i64 %i.next, 129
+  br i1 %cond, label %for.end, label %for.body
+
+for.end:
+  ret void
+}
+
+; For a loop with unknown trip count but a profile showing an approx TC estimate of 128, 
+; when we force VF 64, it should use IC 2, since chances are high that the remainder loop
+; won't need to run
+; CHECK: remark: <unknown>:0:0: vectorized loop (vectorization width: 64, interleaved count: 2)
+define void @loop_with_profile_tc_128(ptr %p, ptr %q, i64 %n) {
+entry:
+  br label %for.body
+
+for.body:
+  %i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
+  %tmp0 = getelementptr %pair, ptr %p, i64 %i, i32 0
+  %tmp1 = load i8, ptr %tmp0, align 1
+  %tmp2 = getelementptr %pair, ptr %p, i64 %i, i32 1
+  %tmp3 = load i8, ptr %tmp2, align 1
+  %add = add i8 %tmp1, %tmp3
+  %qi = getelementptr i8, ptr %q, i64 %i
+  store i8 %add, ptr %qi, align 1
+  %i.next = add nuw nsw i64 %i, 1
+  %cond = icmp eq i64 %i.next, %n
+  br i1 %cond, label %for.end, label %for.body, !prof !0
+
+for.end:
+  ret void
+}
+
+; For a loop with unknown trip count but a profile showing an approx TC estimate of 129, 
+; when we force VF 64, it should use IC 1, since chances are high that the remainder loop
+; will need to run
+; CHECK: remark: <unknown>:0:0: vectorized loop (vectorization width: 64, interleaved count: 1)
+define void @loop_with_profile_tc_129(ptr %p, ptr %q, i64 %n) {
+entry:
+  br label %for.body
+
+for.body:
+  %i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
+  %tmp0 = getelementptr %pair, ptr %p, i64 %i, i32 0
+  %tmp1 = load i8, ptr %tmp0, align 1
+  %tmp2 = getelementptr %pair, ptr %p, i64 %i, i32 1
+  %tmp3 = load i8, ptr %tmp2, align 1
+  %add = add i8 %tmp1, %tmp3
+  %qi = getelementptr i8, ptr %q, i64 %i
+  store i8 %add, ptr %qi, align 1
+  %i.next = add nuw nsw i64 %i, 1
+  %cond = icmp eq i64 %i.next, %n
+  br i1 %cond, label %for.end, label %for.body, !prof !1
+
+for.end:
+  ret void
+}
+
+!0 = !{!"branch_weights", i32 1, i32 127}
+!1 = !{!"branch_weights", i32 1, i32 128}

nilanjana87 · 2023-10-26T00:42:29Z

The initial tests are added in a separate PR in #70272.

github-actions · 2023-11-11T01:33:43Z

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:

git-clang-format --diff 2f4328e6979004fbf531d69a40c2e06d43d3128c 3104a425e31aad4defdeda4e0bb8a8c2d0549703 -- llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

View the diff from clang-format here.

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 5786e8e38c..45f2c442cd 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -3841,8 +3841,7 @@ void InnerLoopVectorizer::fixReduction(VPReductionPHIRecipe *PhiR,
           TTI->preferPredicatedReductionSelect(
               RdxDesc.getOpcode(), PhiTy,
               TargetTransformInfo::ReductionFlags())) {
-        auto *VecRdxPhi =
-            cast<PHINode>(State.get(PhiR, Part));
+        auto *VecRdxPhi = cast<PHINode>(State.get(PhiR, Part));
         VecRdxPhi->setIncomingValueForBlock(VectorLoopLatch, Sel);
       }
     }
@@ -8309,18 +8308,17 @@ VPWidenCallRecipe *VPRecipeBuilder::tryToWidenCall(CallInst *CI,
 
   // Is it beneficial to perform intrinsic call compared to lib call?
   bool ShouldUseVectorIntrinsic =
-      ID && LoopVectorizationPlanner::getDecisionAndClampRange(
-                [&](ElementCount VF) -> bool {
-                  Function *Variant;
-                  // Is it beneficial to perform intrinsic call compared to lib
-                  // call?
-                  InstructionCost CallCost =
-                      CM.getVectorCallCost(CI, VF, &Variant);
-                  InstructionCost IntrinsicCost =
-                      CM.getVectorIntrinsicCost(CI, VF);
-                  return IntrinsicCost <= CallCost;
-                },
-                Range);
+      ID &&
+      LoopVectorizationPlanner::getDecisionAndClampRange(
+          [&](ElementCount VF) -> bool {
+            Function *Variant;
+            // Is it beneficial to perform intrinsic call compared to lib
+            // call?
+            InstructionCost CallCost = CM.getVectorCallCost(CI, VF, &Variant);
+            InstructionCost IntrinsicCost = CM.getVectorIntrinsicCost(CI, VF);
+            return IntrinsicCost <= CallCost;
+          },
+          Range);
   if (ShouldUseVectorIntrinsic)
     return new VPWidenCallRecipe(*CI, make_range(Ops.begin(), Ops.end()), ID);

fhahn · 2023-11-27T10:38:41Z

With #70272 landed, could you rebase this one again?

TargetLowering.h shouldn't include any passes and thus pull in the entire pass infrastructure. Replace the include with forward declarations.

…vm#72444) This patch removes the val field from the `MapInfoOp`. Previously when lowering `TargetOp`, the bounds information for the `BoxValues` were also being mapped. Instead these ops are now cloned inside the target region to prevent mapping of non reference typed values.

…opy (llvm#72424)

The following pattern fails on recent GCC versions with -std=c++20 flag passed and succeeds with -std=c++17. Such behavior is not observed on Clang 16.0. ``` template <typename T> struct Foo { Foo<T>(int a) {} }; ``` This patch removes template parameter from constructor in two occurences to make the following command complete successfully: bazel build -c fastbuild --cxxopt=-std=c++20 --host_cxxopt=-std=c++20 @llvm-project//mlir/... This patch is similar to https://reviews.llvm.org/D154782 Co-authored-by: Alexander Batashev <[email protected]>

Struct Module::Header is not a POD type. As such, qsort() and llvm::array_pod_sort() must not be used to sort it. This became an issue with the new implementation of qsort() in glibc 2.39 that is not guaranteed to be a stable sort, causing Headers to be re-ordered and corrupted. Replace the usage of llvm::array_pod_sort() with std::stable_sort() in order to fix this issue. The signature of compareModuleHeaders() has to be modified. Fixes llvm#73145.

Update regex to _explicitly_ show which exp versions are added. The previous regex used `exp[^e]` to avoid matching calls like: `@llvm.experimental.stepvector`. Note: ArmPL Mappings for scalable types are not yet utilized (eg, `llvm.exp10.nxv2f64`, `llvm.exp10.nxv4f32`), as `replace-with-veclib` pass needs improvements.

…lvm#72526) The code in the CloneInstructionsIntoPredec... function modified by this patch has a long history that dates back to 2011, see d715ec8. There, when folding branches, all dbg.value intrinsics seen when folding would be saved and then re-inserted at the end of whatever was folded. Over the last 12 years this behaviour has been preserved. However, IMO it's bad behaviour. If we have: inst1 dbg.value1 inst2 dbg.value2 And we fold that sequence into a different block, then we would want the instructions and variable assignments to appear in the same order. However because of this old behaviour, the dbg.values are sunk, and we get: inst1 inst2 dbg.value1 dbg.value2 This clustering of dbg.values can make assignments to the same variable invisible, as well as reducing the coverage of other assignments. This patch relaxes the CloneInstructions... function and allows it to clone and update dbg.values in-place, causing them to appear in the original order in the destination block. I've added some extra dbg.values to the updated test: without the changes to the pass, the dbg.values sink into a blob ahead of the select. The RemoveDIs code can't cope with this right now so I've removed the "--try..." flag, restored in a commit to land in a couple of hours. (Metadata changes to make the LLVM-IR parser not drop the debug-info for it being out of date. The RemoveDIs related RUN line has been removed because it was spuriously passing due to the debug-info being dropped).

…n type of function. (llvm#69724) Import of a function with `auto` return type that is expanded to a `SubstTemplateTypeParmType` could fail if the function itself is the template specialization where the parameter was replaced.

) This patch is rearranging code a bit to add WatchpointResources to Process. A WatchpointResource is meant to represent a hardware watchpoint register in the inferior process. It has an address, a size, a type, and a list of Watchpoints that are using this WatchpointResource. This current patch doesn't add any of the features of WatchpointResources that make them interesting -- a user asking to watch a 24 byte object could watch this with three 8 byte WatchpointResources. Or a Watchpoint on 1 byte at 0x1002 and a second watchpoint on 1 byte at 0x1003, these must both be served by a single WatchpointResource on that doubleword at 0x1000 on a 64-bit target, if two hardware watchpoint registers were used to track these separately, one of them may not be hit. Or if you have one Watchpoint on a variable with a condition set, and another Watchpoint on that same variable with a command defined or different condition, or ignorecount, both of those Watchpoints need to evaluate their criteria/commands when their WatchpointResource has been hit. There's a bit of code movement to rearrange things in the direction I'll need for implementing this feature, so I want to start with reviewing & landing this mostly NFC patch and we can focus on the algorithmic choices about how WatchpointResources are shared and handled as they're triggeed, separately. This patch also stops printing "Watchpoint <n> hit: old value: <x>, new vlaue: <y>" for Read watchpoints. I could make an argument for print "Watchpoint <n> hit: current value <x>" but the current output doesn't make any sense, and the user can print the value if they are particularly interested. Read watchpoints are used primarily to understand what code is reading a variable. This patch adds more fallbacks for how to print the objects being watched if we have types, instead of assuming they are all integral values, so a struct will print its elements. As large watchpoints are added, we'll be doing a lot more of those. To track the WatchpointSP in the WatchpointResources, I changed the internal API which took a WatchpointSP and devolved it to a Watchpoint*, which meant touching several different Process files. I removed the watchpoint code in ProcessKDP which only reported that watchpoints aren't supported, the base class does that already. I haven't yet changed how we receive a watchpoint to identify the WatchpointResource responsible for the trigger, and identify all Watchpoints that are using this Resource to evaluate their conditions etc. This is the same work that a BreakpointSite needs to do when it has been tiggered, where multiple Breakpoints may be at the same address. There is not yet any printing of the Resources that a Watchpoint is implemented in terms of ("watchpoint list", or SBWatchpoint::GetDescription). "watchpoint set var" and "watchpoint set expression" take a size argument which was previously 1, 2, 4, or 8 (an enum). I've changed this to an unsigned int. Most hardware implementations can only watch 1, 2, 4, 8 byte ranges, but with Resources we'll allow a user to ask for different sized watchpoints and set them in hardware-expressble terms soon. I've annotated areas where I know there is work still needed with LWP_TODO that I'll be working on once this is landed. I've tested this on aarch64 macOS, aarch64 Linux, and Intel macOS. https://discourse.llvm.org/t/rfc-large-watchpoint-support-in-lldb/72116

…lvm#68926) This PR adds support for the PPA2 fields. --------- Co-authored-by: Yusra Syeda <[email protected]>

The previous optional class would call the destructor on a non-trivially destructible object regardless of if it had already been reset. This patch fixes this by moving tracking for if the object exists into the internal storage class for optional.

On macOS <vector> was already included via another header file, but this failed on the CI bots. I'd tested on linux earlier in this patch's life when the headers were differently arranged.

…3439) In the FreeBSD base system, re-executing the main binary when ASLR is detected was implemented in the following commits: * freebsd/freebsd-src@7cafe89f9ce33 * freebsd/freebsd-src@96fe7c8ab0f65 * freebsd/freebsd-src@930a7c2ac67e1 * freebsd/freebsd-src@0a736f0a6aeb0 * freebsd/freebsd-src@4c9a0adad1826 Squash all these to bring them into upstream compiler-rt. When ASLR is detected to be enabled, this first force-disables ASLR for the current process, then calls ReExec(). The ReExec() function gets a FreeBSD specific implementation for finding the path of the executed program, via the ELF auxiliary vector. This is done without calling into the regular elf_aux_info(3) function, as that makes use of several already-intercepted functions.

Summary: These tests are currently failing, disable them so we can keep the bots green until we find a better solution. The x64 tests are not the core target so this is low priority.

This adds functionality to tbd files similar to what `lipo -extract` and `lipo -remove` does for binaries.

The "Dim" prefix is a legacy left-over that no longer makes sense, since we have a very strict "Dimension" vs. "Level" definition for sparse tensor types and their storage.

…lvm#72666) This is the first in a planned patch series to teach our vector lowering how to exploit register boundaries in LMUL>1 types when VLEN is known to be an exact constant. This corresponds to code compiled by clang with the -mrvv-vector-bits=zvl option. For extract_vector_elt, if we have a constant index and a known vlen, then we can identify which register out of a register group is being accessed. Given this, we can do a sub-register extract for that register, and then shift any remaining index. This results in all constant index extracts becoming m1 operations, and thus eliminates the complexity concern for explode-vector idioms at high lmul.

This sanitizes the string for printing.

std::hash and ADT/Hashing::hash_value are non-deterministic functions whose results might vary across implementation/process/execution. Using xxh3 instead for computing hashes of BinaryFunctions and BinaryBasicBlock for stale profile matching. (A possible alternative is to use ADT/StableHashing.h based on FNV hashing but xxh3 seems to be more popular in LLVM) This is to address llvm#65241.

I recently moved from using vim to lunarvim. I noticed that `gcc` (lol) wasn't able to comment out lines, because commentstring is not set. Thanks to Chase Colman for the suggestion in LunarVim/LunarVim#4394.

This pass isn't used anywhere upstream and thus has no test coverage. For these reasons, remove it.

Breaks https://lab.llvm.org/buildbot/#/builders/5/builds/38700 This reverts commit c43c885.

Since this patch is the first part of a two-part patch for changing loop interleaving count computation, followed by removing loop interleaving threshold, these tests remain unchanged for the current (first) patch but will be modified for the second patch. The original PR for these tests is in llvm#70272.

…ments

david-arm · 2023-11-28T11:54:10Z

With #70272 landed, could you rebase this one again?

I think something has gone wrong with the rebase - your patch now has 250 commits.

llvm#70209) …to also take into account the op itself

…vm#70907) Unless specified as a "raw" string, Python will try to interpret backslashes, which means they don't get passed on correctly to the underlying regex-engine, unless escaped manually (`\\`). Use raw strings in `llvm/test/lit.cfg.py` to avoid a `SyntaxWarning`: ``` llvm/test/lit.cfg.py:275: SyntaxWarning: invalid escape sequence '\d' match = re.search("release (\d+)\.(\d+)", ptxas_out) ``` Other such warning present in 17.0.x were already fixed in 7ed0f5b.

nilanjana87 · 2023-11-28T23:58:56Z

I think something has gone wrong with the rebase - your patch now has 250 commits.

Not sure if this was because I had multiple remotes configured, the rebasing got completely messed up! I have tried several ways to fix it, but it's not going back to the previous stage. I'll close this PR & create a new one with the final version.

nilanjana87 · 2023-11-29T09:22:06Z

Apologies for messing up the rebase process to a place that I could not easily recover from. I added a clean patch in #73766 which is rebased on latest code that includes the tests in #70272.

nilanjana87 requested review from nikic, fhahn, davemgreen, topperc, fpetrogalli, ayalz, ebrevnov, david-arm and AaronHLiu October 24, 2023 23:02

llvmbot added vectorizers llvm:transforms labels Oct 24, 2023

nilanjana87 mentioned this pull request Oct 25, 2023

[LV] Relax high loop trip count threshold for deciding to interleave a loop #67725

Merged

nilanjana87 mentioned this pull request Oct 28, 2023

[LV] Pre-committing tests for changing loop interleaving count computation #70272

Merged

kadircet and others added 12 commits November 27, 2023 17:18

Fix build failure on certain bots

c9302e3

[TargetLowering] Don't include ComplexDeinterleavingPass.h (NFC)

f3933c5

TargetLowering.h shouldn't include any passes and thus pull in the entire pass infrastructure. Replace the include with forward declarations.

[MLIR][OpenMP] Fix the assertion failure for VariableCaptureKind::ByC…

77595cc

…opy (llvm#72424)

Remove extraneous ` in AttrDocs.td

25ec91b

[libc][NFC] Remove dead code (llvm#73315)

e28d55b

[clang] Classify vector types in __builtin_classify_type (llvm#73299)

4231f5d

jasonmolenda and others added 21 commits November 27, 2023 17:18

[SystemZ][z/OS] This change adds support for the PPA2 section in zOS (l…

cf084a8

…lvm#68926) This PR adds support for the PPA2 fields. --------- Co-authored-by: Yusra Syeda <[email protected]>

Include <vector> in WatchpointResource.h

ad1f08b

On macOS <vector> was already included via another header file, but this failed on the CI bots. I'd tested on linux earlier in this patch's life when the headers were differently arranged.

[OpenMP] Temporarily disable OMPT tests for x64

32e39b1

Summary: These tests are currently failing, disable them so we can keep the bots green until we find a better solution. The x64 tests are not the core target so this is low priority.

[GitHub] Add @hanhanW to CODEOWNERS for relevant files. (llvm#73571)

f2e83fe

[mlir][python] add type wrappers (llvm#71218)

583e489

[sanitizer][nfc] Reformat sanitizer_linux sources (llvm#73573)

4948f5c

[readtapi] Add Extract & Remove architecture functionality (llvm#72657)

d535ad4

This adds functionality to tbd files similar to what `lipo -extract` and `lipo -remove` does for binaries.

[mlir][sparse] rename DimLevelType to LevelType (llvm#73561)

15aa469

The "Dim" prefix is a legacy left-over that no longer makes sense, since we have a very strict "Dimension" vs. "Level" definition for sparse tensor types and their storage.

[NFC] Map changed file through envar

c48fd3e

This sanitizes the string for printing.

[vim] set commentstring (llvm#71838)

5d6c70a

I recently moved from using vim to lunarvim. I noticed that `gcc` (lol) wasn't able to comment out lines, because commentstring is not set. Thanks to Chase Colman for the suggestion in LunarVim/LunarVim#4394.

[mlir][sparse] code cleanup, remove FIXMEs (llvm#73575)

bee876a

[NewPM] Remove LazyValueInfoPrinter Pass (llvm#73408)

66f5864

This pass isn't used anywhere upstream and thus has no test coverage. For these reasons, remove it.

Revert "[mlir] Add mlirTranslateModuleToLLVMIR to MLIR-C (llvm#73117)"

f02acfd

Breaks https://lab.llvm.org/buildbot/#/builders/5/builds/38700 This reverts commit c43c885.

Rebasing tests that changed in llvm#70272 for addressing reviewer com…

83cc612

…ments

Rebasing tests that changed in llvm#70272 for addressing reviewer com…

18fcff6

…ments

nilanjana87 requested a review from Endilll as a code owner November 28, 2023 01:28

nicolasvasilache and others added 2 commits November 28, 2023 15:48

[mlir][Transform] Relax the applicability of transform.foreach_match … (

636615c

llvm#70209) …to also take into account the op itself

nilanjana87 mentioned this pull request Nov 29, 2023

[LV] Change loops' interleave count computation #73766

Merged

nilanjana87 closed this Nov 29, 2023

Endilll removed their request for review April 17, 2024 04:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LV] Change loops' interleave count computation #70141

[LV] Change loops' interleave count computation #70141

Uh oh!

nilanjana87 commented Oct 24, 2023

Uh oh!

llvmbot commented Oct 24, 2023

Uh oh!

nilanjana87 commented Oct 26, 2023 •

edited

Loading

Uh oh!

github-actions bot commented Nov 11, 2023 •

edited

Loading

Uh oh!

fhahn commented Nov 27, 2023

Uh oh!

david-arm commented Nov 28, 2023

Uh oh!

nilanjana87 commented Nov 28, 2023

Uh oh!

nilanjana87 commented Nov 29, 2023

Uh oh!

Uh oh!

[LV] Change loops' interleave count computation #70141

[LV] Change loops' interleave count computation #70141

Uh oh!

Conversation

nilanjana87 commented Oct 24, 2023

Uh oh!

llvmbot commented Oct 24, 2023

Uh oh!

nilanjana87 commented Oct 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Nov 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fhahn commented Nov 27, 2023

Uh oh!

david-arm commented Nov 28, 2023

Uh oh!

nilanjana87 commented Nov 28, 2023

Uh oh!

nilanjana87 commented Nov 29, 2023

Uh oh!

Uh oh!

nilanjana87 commented Oct 26, 2023 •

edited

Loading

github-actions bot commented Nov 11, 2023 •

edited

Loading