-
Notifications
You must be signed in to change notification settings - Fork 18k
cmd/compile: Use wide integer load/store instructions if possible #11819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This would be great, you do need to worry about alignment on some architectures though and I don't know how that should be handled. |
If I remember correctly, this should be okay for x86, even though there may be a performance penalty for unaligned reads, it is (I'm guessing) probably faster than 8 calls to MOVB. I can't say the same for other architectures like ARM and PowerPC. Also, when I think about it, I think fixing this issue implies that #5364 is also fixed too. The only way that a wide integer load/store can be used is if the compiler can reason that the load/store will not go out of bounds. Furthermore, it also means that some changes would need to be made in regular Go code to take advantage of this as well. The code presented above would need to be rewritten as: func (littleEndian) PutUint64(b []byte, v int64) {
b[7] = byte(v >> 56)
b[6] = byte(v >> 48)
b[5] = byte(v >> 40)
b[4] = byte(v >> 32)
b[3] = byte(v >> 24)
b[2] = byte(v >> 16)
b[1] = byte(v >> 8)
b[0] = byte(v)
} Since the highest index is accessed first, a single bounds checked can be used. If it passes, it can proceed to use the wide integer store. Do note that this code is not functionally equivalent to the original. If a panic occurs, the input slice will not hold a partial result. |
Note that you cannot in general apply the suggested optimization to PutUint64. You can tell the difference between the standard execution of that code and the "optimization" when len(b) < 8, because the panic in the code as written must happen with the earlier writes having been done. |
And, looks like #14267 already implements desired behavior, no? |
Yes, this is now fixed (for amd64). |
Using go1.5beta2
Compile the following code from encoding/binary:
This produces the following:
Putting aside the superfluous amount of bounds checking (captured in #5364), it seems like this entire function could be accomplished using some type of store64 instruction instead of 8 repetitive copies of something like:
The text was updated successfully, but these errors were encountered: