Skip to content

Regression in code quality for horizontal add after 70a54bca6f #94546

Closed
@dyung

Description

@dyung

We have an internal test which tests whether the compiler generates horizontal add instructions for certain cases. Recently we noticed that one of the cases, the code generated seems to have gotten worse after a recent change 70a54bc.

Consider the following code:

__attribute__((noinline))
__m256d add_pd_004(__m256d a, __m256d b) {
  __m256d r = (__m256d){ a[0] + a[1], a[2] + a[3], b[0] + b[1], b[2] + b[3] };
  return __builtin_shufflevector(r, a, 0, -1, -1, 3);
}

If compiled with optimizations targeting btver2 (-S -O2 -march=btver2), the compiler previously generated the following code:

        vhaddpd ymm0, ymm0, ymm1
        ret

But after 70a54bc, the compiler now is generating worse code:

        vextractf128    xmm1, ymm1, 1
        vhaddpd xmm0, xmm0, xmm1
        vinsertf128     ymm0, ymm0, xmm0, 1
        ret

@alexey-bataev, this was your change, can you take a look to see if there is a way we can avoid the regression in code quality in this case?

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions