-
Notifications
You must be signed in to change notification settings - Fork 13.4k
Missed optimization: per-byte comparisons are not always combined #117853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
define range(i32 0, 2) i32 @is_all_ones(ptr nocapture noundef readonly %p) {
entry:
%0 = load i8, ptr %p, align 1
%arrayidx1 = getelementptr inbounds i8, ptr %p, i64 1
%1 = load i8, ptr %arrayidx1, align 1
%cmp = icmp eq i8 %0, -1
%cmp3 = icmp eq i8 %1, -1
%2 = select i1 %cmp, i1 %cmp3, i1 false
%conv4 = zext i1 %2 to i32
ret i32 %conv4
} |
@purplesyringa so you mean only requiring a single word-read if both elements are next to each other like this?
|
Yes (obviously, with arbitrary constants, not just int is_all_ones(unsigned char *p, unsigned char *q) {
return (p[0] == q[0]) & (p[1] == q[1]);
} would compile to a single comparison, too. |
@RKSimon Do you have a suggestion where this should be implemented (should it even)? I saw that there was once a dedicated According to this discussion merging adjacent loads is still being performed (as of 2019) however I couldn't pinpoint where exactly (InstCombine, MemCpyOptimizer, SelectionDAG?). |
In the middle-end, a possible place to do it would be AggressiveInstCombine, which already has a transform for merging loads (but not targeting comparisons). |
We could also view this in terms of forming a memcmp (which later gets reexpanded) via MergeICmps. I think that pass may be more convenient in terms of existing infrastructure to match such patterns (including for the non-trivial cases spread across many blocks). |
FTR, if this had been 4 or more (pow2) comparisons, SLP would have caught it - but it avoids creating reductions of 2 x vectors. int is_all_ones(unsigned char *p, unsigned char *q) {
return (p[0] == q[0]) & (p[1] == q[1]) & (p[2] == q[2]) & (p[3] == q[3]);
} |
@nikic I looked at the existing code in MergeICmps which uses these
The emitted IR for the example from above looks like this:
The emitted assembly looks like this:
The first I'll look into combining this merge-type with the existing code next (I have to refine the implementation a lot more to take care of edge-cases before that though). This might take a while since I'm busy next week but I'll submit a draft once I have a basic working pass. |
Reproducer: https://godbolt.org/z/6caGGoo1e
I've expected this to compile to a single read + comparison on platforms that support unaligned accesses. Instead, this happened:
The text was updated successfully, but these errors were encountered: