Skip to content

Commit bf48163

Browse files
agarciamontorojosharian
authored andcommitted
cmd/compile: add rule to coalesce writes
The code generated when storing eight bytes loaded from memory created a series of small writes instead of a single, large one. The specific pattern of instructions generated stored 1 byte, then 2 bytes, then 4 bytes, and finally 1 byte. The new rules match this specific pattern both for amd64 and for s390x, and convert it into a single instruction to store the 8 bytes. arm64 and ppc64le already generated the right code, but the new codegen test covers also those architectures. Fixes #41663 Change-Id: Ifb9b464be2d59c2ed5034acf7b9c3e473f344030 Reviewed-on: https://go-review.googlesource.com/c/go/+/280456 Reviewed-by: Josh Bleecher Snyder <[email protected]> Trust: Josh Bleecher Snyder <[email protected]> Trust: Jason A. Donenfeld <[email protected]> Run-TryBot: Josh Bleecher Snyder <[email protected]> TryBot-Result: Go Bot <[email protected]>
1 parent b7f62da commit bf48163

File tree

5 files changed

+125
-0
lines changed

5 files changed

+125
-0
lines changed

src/cmd/compile/internal/ssa/gen/AMD64.rules

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1969,6 +1969,16 @@
19691969
&& clobber(x)
19701970
=> (MOVQstore [i] {s} p0 w0 mem)
19711971

1972+
(MOVBstore [7] p1 (SHRQconst [56] w)
1973+
x1:(MOVWstore [5] p1 (SHRQconst [40] w)
1974+
x2:(MOVLstore [1] p1 (SHRQconst [8] w)
1975+
x3:(MOVBstore p1 w mem))))
1976+
&& x1.Uses == 1
1977+
&& x2.Uses == 1
1978+
&& x3.Uses == 1
1979+
&& clobber(x1, x2, x3)
1980+
=> (MOVQstore p1 w mem)
1981+
19721982
(MOVBstore [i] {s} p
19731983
x1:(MOVBload [j] {s2} p2 mem)
19741984
mem2:(MOVBstore [i-1] {s} p

src/cmd/compile/internal/ssa/gen/S390X.rules

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1420,6 +1420,16 @@
14201420
&& clobber(x)
14211421
=> (MOVDBRstore [i-4] {s} p w0 mem)
14221422

1423+
(MOVBstore [7] p1 (SRDconst w)
1424+
x1:(MOVHBRstore [5] p1 (SRDconst w)
1425+
x2:(MOVWBRstore [1] p1 (SRDconst w)
1426+
x3:(MOVBstore p1 w mem))))
1427+
&& x1.Uses == 1
1428+
&& x2.Uses == 1
1429+
&& x3.Uses == 1
1430+
&& clobber(x1, x2, x3)
1431+
=> (MOVDBRstore p1 w mem)
1432+
14231433
// Combining byte loads into larger (unaligned) loads.
14241434

14251435
// Big-endian loads

src/cmd/compile/internal/ssa/rewriteAMD64.go

Lines changed: 48 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

src/cmd/compile/internal/ssa/rewriteS390X.go

Lines changed: 48 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

test/codegen/memcombine.go

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -367,6 +367,15 @@ func store_le64_idx(b []byte, idx int) {
367367
binary.LittleEndian.PutUint64(b[idx:], sink64)
368368
}
369369

370+
func store_le64_load(b []byte, x *[8]byte) {
371+
_ = b[8]
372+
// amd64:-`MOV[BWL]`
373+
// arm64:-`MOV[BWH]`
374+
// ppc64le:-`MOV[BWH]`
375+
// s390x:-`MOVB`,-`MOV[WH]BR`
376+
binary.LittleEndian.PutUint64(b, binary.LittleEndian.Uint64(x[:]))
377+
}
378+
370379
func store_le32(b []byte) {
371380
// amd64:`MOVL\s`
372381
// arm64:`MOVW`,-`MOV[BH]`

0 commit comments

Comments
 (0)