-
Notifications
You must be signed in to change notification settings - Fork 5.2k
missing opportunities in constant folding around bitwise ops #99136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsFixes #95747
|
3223e8b
to
47543c0
Compare
We don't have time to work on this for .NET 9, so we will review it in .NET 10. |
@amanasifkhalid, please review this community PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pedrobsaila sorry to keep you waiting -- when you're ready, could you please rebase this on top of main so we can kick off a new CI run? Bool opts have undergone some churn lately, so resolving the merge conflicts might take some work.
src/coreclr/jit/optimizebools.cpp
Outdated
bool isBool; // If the compTree is boolean expression | ||
}; | ||
|
||
struct IntBoolOpDsc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of the struct-and-initializer pattern, could you please rewrite this as a class so that all the relevant transformations are scoped to it? OptBoolsDsc
might be a useful model.
The results are the same as before, no noticeable diffs. Either my transformation fails to recognize some gen tree format I'm unaware of or this pattern doesn't exist in libraries/tests/aspnet/benchmark code which seems surprising |
class IntBoolOpDsc | ||
{ | ||
private: | ||
IntBoolOpDsc(Compiler* comp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Can we use an initializer list here instead? Ex:
IntBoolOpDsc(Compiler* comp)
: ctsArray()
, ctsArrayLength()
// etc.
src/coreclr/jit/optimizebools.cpp
Outdated
} | ||
|
||
//----------------------------------------------------------------------------- | ||
// Reinit: Procedure that reinitialize IntBoolOpDsc reference |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Reinit: Procedure that reinitialize IntBoolOpDsc reference | |
// Reinit: Procedure that reinitializes IntBoolOpDsc reference |
src/coreclr/jit/optimizebools.cpp
Outdated
// Arguments: | ||
// tree lcl var tree | ||
// | ||
void IntBoolOpDsc::AppendToLclVarArray(GenTree* tree) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you need growable arrays, try using ArrayStack
instead (you might need to #include "arraystack.h"
first). For 8 or fewer elements, it allocates on the stack, and switches to the heap if needed. I imagine for most cases, these arrays won't exceed 8 elements, so ArrayStack
makes sense here.
src/coreclr/jit/optimizebools.cpp
Outdated
//----------------------------------------------------------------------------- | ||
// Free: Procedure that frees IntBoolOpDsc reference | ||
// | ||
void IntBoolOpDsc::Free() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once you switch the arrays over to using ArrayStack
, you can get rid of IntBoolOpDsc::Free
entirely.
src/coreclr/jit/optimizebools.cpp
Outdated
} | ||
|
||
//----------------------------------------------------------------------------- | ||
// TryOptimize: Function that fold constant INT OR operations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// TryOptimize: Function that fold constant INT OR operations | |
// TryOptimize: Function that folds constant INT OR operations |
[MethodImpl(MethodImplOptions.NoInlining)] | ||
private static int Or10Or5(int x, int y) | ||
{ | ||
return (x | 10) | (y | 5); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please try running this test with your changes and the environment variable DOTNET_JitDisasm="Or10Or5"
set so we can verify if the optimization kicked in? Let me know if you need help running the test locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
; Assembly listing for method CBoolTest:Or10Or5(int,int):int (FullOpts)
; Emitting BLENDED_CODE for X64 with AVX - Windows
; FullOpts code
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; Final local variable assignments
;
; V00 arg0 [V00,T00] ( 3, 3 ) int -> rcx single-def
; V01 arg1 [V01,T01] ( 3, 3 ) int -> rdx single-def
;# V02 OutArgs [V02 ] ( 1, 1 ) struct ( 0) [rsp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
;
; Lcl frame size = 0
G_M36715_IG01: ;; offset=0x0000
;; size=0 bbWeight=1 PerfScore 0.00
G_M36715_IG02: ;; offset=0x0000
mov eax, edx
or eax, ecx
or eax, 15
;; size=7 bbWeight=1 PerfScore 0.75
G_M36715_IG03: ;; offset=0x0007
ret
;; size=1 bbWeight=1 PerfScore 1.00
; Total bytes of code 8, prolog size 0, PerfScore 1.75, instruction count 4, allocated bytes for code 8 (MethodHash=f92e7094) for method CBoolTest:Or10Or5(int,int):int (FullOpts)
; ============================================================
; Assembly listing for method CBoolTest:LongOr10Or5(long,long):long (FullOpts)
; Emitting BLENDED_CODE for X64 with AVX - Windows
; FullOpts code
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; Final local variable assignments
;
; V00 arg0 [V00,T00] ( 3, 3 ) long -> rcx single-def
; V01 arg1 [V01,T01] ( 3, 3 ) long -> rdx single-def
;# V02 OutArgs [V02 ] ( 1, 1 ) struct ( 0) [rsp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
;
; Lcl frame size = 0
G_M33752_IG01: ;; offset=0x0000
;; size=0 bbWeight=1 PerfScore 0.00
G_M33752_IG02: ;; offset=0x0000
mov rax, rdx
or rax, rcx
or rax, 15
;; size=10 bbWeight=1 PerfScore 0.75
G_M33752_IG03: ;; offset=0x000A
ret
;; size=1 bbWeight=1 PerfScore 1.00
; Total bytes of code 11, prolog size 0, PerfScore 1.75, instruction count 4, allocated bytes for code 11 (MethodHash=4c277c27) for method CBoolTest:LongOr10Or5(long,long):long (FullOpts)
; ============================================================
; Assembly listing for method CBoolTest:ByteOr10Or5(ubyte,ubyte):int (FullOpts)
; Emitting BLENDED_CODE for X64 with AVX - Windows
; FullOpts code
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; Final local variable assignments
;
; V00 arg0 [V00,T00] ( 3, 3 ) ubyte -> rcx single-def
; V01 arg1 [V01,T01] ( 3, 3 ) ubyte -> rdx single-def
;# V02 OutArgs [V02 ] ( 1, 1 ) struct ( 0) [rsp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
;
; Lcl frame size = 0
G_M39681_IG01: ;; offset=0x0000
;; size=0 bbWeight=1 PerfScore 0.00
G_M39681_IG02: ;; offset=0x0000
movzx rax, dl
movzx rcx, cl
or eax, ecx
or eax, 15
;; size=11 bbWeight=1 PerfScore 1.00
G_M39681_IG03: ;; offset=0x000B
ret
;; size=1 bbWeight=1 PerfScore 1.00
; Total bytes of code 12, prolog size 0, PerfScore 2.00, instruction count 5, allocated bytes for code 12 (MethodHash=65d464fe) for method CBoolTest:ByteOr10Or5(ubyte,ubyte):int (FullOpts)
; ============================================================
; Assembly listing for method CBoolTest:ShortOr10Or5(short,short):int (FullOpts)
; Emitting BLENDED_CODE for X64 with AVX - Windows
; FullOpts code
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; Final local variable assignments
;
; V00 arg0 [V00,T00] ( 3, 3 ) short -> rcx single-def
; V01 arg1 [V01,T01] ( 3, 3 ) short -> rdx single-def
;# V02 OutArgs [V02 ] ( 1, 1 ) struct ( 0) [rsp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
;
; Lcl frame size = 0
G_M54649_IG01: ;; offset=0x0000
;; size=0 bbWeight=1 PerfScore 0.00
G_M54649_IG02: ;; offset=0x0000
movsx rax, dx
movsx rcx, cx
or eax, ecx
or eax, 15
;; size=13 bbWeight=1 PerfScore 1.00
G_M54649_IG03: ;; offset=0x000D
ret
;; size=1 bbWeight=1 PerfScore 1.00
; Total bytes of code 14, prolog size 0, PerfScore 2.00, instruction count 5, allocated bytes for code 14 (MethodHash=ad252a86) for method CBoolTest:ShortOr10Or5(short,short):int (FullOpts)
; ============================================================
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The transformation kicks in if I set explicitly DOTNET_TieredCompilation to 0 (working with debug configuration)
@pedrobsaila, is this PR ready for another round of review? |
yes it is ready, sorry for keeping you waiting. Since my previous code was not making any asm diffs. I tried extending the support for long, byte, shorts to see if I can make any. Unfortunately, I was not lucky. If you see that the new code adds unnecessary complexity, let me know so I roll it back |
Just fixing here conflict with main branch |
Fixes #95747