You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If you change the view to occur at the end, you do get a single fused kernel, and a 2.5x speedup. I am going to look into making the fusion occur automatically in inductor. Still need to scope out what changes are involved. But maybe this is possible as a manual change workaround for now?
What this cast is doing
We really should do this all in one kernel, but today we see two kernels
How to reproduce (requires latest main branch)
Output logs: https://gist.github.com/vkuzo/ce205fde5ae6b0fc223892c8a46560d4 - we currently see two kernels
The text was updated successfully, but these errors were encountered: