Skip to content

proposal: unsafe: inline assembly with unsafe.Asm function #26891

Closed
@quasilyte

Description

@quasilyte

Proposal: inline assembly

Author: Iskander Sharipov

With input from Ilya Tocar.

Last updated: 9 August, 2018

Abstract

This proposal describes how inline assembly feature can be integrated into
Go language in a backwards-compatible way and without any syntax extensions.

Users that do not write/maintain assembly or not interested in raw clock
performance would not see any difference.

Background

Right now the only way to get high performance for CPU-bound operation is to
write an assembly implementation using latest instructions available (with appropriate
run time CPU flags switch with fallbacks to something more conservative).

Sometimes the performance advantages of assembly version are astonishing,
for functions like bytes.IndexByte it's orders of magnitude improvement:

name            old time/op    new time/op     delta
IndexByte/32-8    32.2ns ± 0%      4.1ns ± 0%    -87.14%  (p=0.000 n=9+10)
IndexByte/4K-8    2.43µs ± 0%     0.08µs ± 2%    -96.55%  (p=0.000 n=10+10)

name            old speed      new speed       delta
IndexByte/32-8   993MB/s ± 0%   7724MB/s ± 0%   +677.74%  (p=0.000 n=9+9)
IndexByte/4K-8  1.68GB/s ± 0%  48.80GB/s ± 2%  +2801.13%  (p=0.000 n=10+10)

The old is portable pure Go version and new is assembly code with AVX2.

Other cases are addressed with increasing amount of intrinsified functions.
The downside is that they pollute the compiler and speedup only a finite
set of intrinsified functions. Not a general enough solution.

When referring to intrinsics, functions like math.Sqrt are implied.

The advantage of Go intrinsics is that they can be inlined, unlike
manually written assembly functions. This leads to conclusion: what if
there was a way to describe ordinary Go function (hence, inlineable) that
does use machine instructions explicitly? This can address all problems described above:

  • It's scalable. Users may define their own intrinsics if they really need to.
  • No need to clutter the compiler internals with intrinsic definitions, they
    can be defined as a normal functions inside Go sources.
    This reduces the burden from the Go compiler maintainers.
  • Writing these functions is less error-prone than writing hundreds lines of
    assembly code. Also easier to maintain and test.
  • It makes inlineable assembler feature request fulfilled, like issue17373 and issue4978.

This proposal describes how to indroduce that facility into the language without
breaking changes and as unintrusive as possible.

Proposal

This document proposes a single new Go function, unsafe.Asm defined as:

func Asm(opcode string, dst interface{}, args ...interface{})

This function is the low level mechanism for Go programmers to inject
machine-dependent code right into the function body at the unsafe.Asm call site.

For example, this line of code results in a single MOVQ AX, $10 instruction:

unsafe.Asm("MOVQ", "AX", 10)

It can be used to build more high-level, intrinsic-like API.
The best part is that it can be implemented as a third-party library.

Like other arch-dependent code, unsafe.Asm should be protected by a build
tag or appropriate filename suffix, like _amd64.

unsafe package is preferable, because:

  1. Inline assembly, just like normal assembly, is unsafe.
  2. unsafe.Pointer can be useful when dealing with memory operands.
  3. It does explicitly state that it may not be as backwards-compatible as
    other Go packages.

unsafe.Asm arguments

opcode refers to the instruction name for the host machine.
All opcodes are in Go assembler syntax and require size suffixes.
It's also possible to pass opcode suffixes along with instruction name.
These suffixes should be separated by period, just like in ordinary Go asm.

dst accepts any assignable Go values, with exception of compound expressions
like index expression and function calls that return a pointer. One can use
temporary variables and/or address taking to overcome this limitation.

args are more permissive than dst and also accept integer and floating-point
constants for immediates as well as more complex Go expressions that yield
value that is permitted for unsafe.Asm arguments.

The permitted values include all numeric types sans complex numbers.
Value must fit the hardware register, so it matches the unsafe.Sizeof(int).
For 32-bit platforms, 64-bit types can't be used.
For all other values pointers should be used.

Pointer types (including unsafe.Pointer) force memory operand interpretation.
Non-pointer types follow default Go value semantics.

var x int64
unsafe.Asm("MOVQ", x, 10)  // MOVQ x(SP), AX; MOVQ $10, AX
unsafe.Asm("MOVQ", &x, 10) // LEAQ x(SP), AX; MOVQ $10, (AX)

Note that dst/src order follows Go conventions, not assembly language convention:
destination goes first, then sources. This also helps to make destination
parameter more distinguishable inside unsafe.Asm signature.

As a special case, instructions that have no explicit arguments use nil destination:

unsafe.Asm("SFENCE", nil)

Comparison-like instructions that usually used to update flags and do not have
explicit destination also use nil destination argument:

// Compare `x` with 1; updates flags.
unsafe.Asm("CMPQ", nil, 1, x)

See Efficient control flow for more details.

Guarantees

It is important to clearly describe guarantees that programmer may rely on.

  • The order of unsafe.Asm is determenistic,
    these calls can't be sheduled somewhere else.
    This means that a sequence of unsafe.Asm is executed in order they
    appear inside source code.
  • CPU flags are preserved between unsafe.Asm calls and unsafe.Asm itself
    is marked as flag clobbering operation.
  • Explicitly allocated registers are not clobbered by the Go compiler.

Efficient control flow

There is no JMP support because inlined assembler does not see Go labels.

In order to make writing efficient programs possible,
SSA backends can recognize this operation sequence and produce optimal code:

var found bool                        // 1. Some bool variable.
unsafe.Asm("VPTEST", nil, "Y3", "Y3") // 2. Some flag-generating operation.
unsafe.Asm("SETNE", found)            // 3. Flags assignment to bool variable.
if found {                            // 4. Branching using that bool variable.
	// Body to be executed (hint: can use goto to Go label here).
}

SETNE can be eliminated as well as found variable read.
Generated machine code becomes close to one that is produced out of hand-written assembly.

Error reporting

There are different kinds of programming errors that may occur during
unsafe.Asm usage.

Go compiler frontend, gc, can catch invalid opcodes and obviously
wrong operand types. For example, JAVA opcode does not exist and will
result in compile-time error triggered from gc. Operands
are checked using generic rules that are shared among all instructions.

Most other errors are generated by assembler backends.
For AMD64 such backend is cmd/internal/obj/x86.

This is the direct consequence of opaqueness of the asm ops during compilation.
That property reduces the amount of code needed to implement inline assembly,
but does delay error reporting, leading to somewhat more cryptic error messages.
In turn, this may be a good opportunity to imporve assembler error reporting.

Example

Given math.Trunc intrinsified function, we can try to define AMD64 version
without direct compiler support.

package example

import (
	"math"
	"unsafe"
)

func trunc1(x float64) float64 {
	return math.Trunc(x)
}

func trunc2(x float64) float64 {
	unsafe.Asm("ROUNDSD", x, 3, x)
	return x
}

trunc1 and trunc2 generate same code sequence:

MOVSD	x(SP), X0
ROUNDSD	$3, X0, X0
MOVSD	X0, ret+(SP)

The only difference is that trunc1 does runtime.support_sse41 check
which can be done inside trunc2 as well.

Compatibility

The API changes are fully backwards compatible.

Implementation

Most of the work would be done by the author of this proposal.

Initial implementation will include AMD64 support for unsafe.Asm code generation.

Other backends can adopt that implementation ideas to add missing architectures support.

Go parts that need modifications:

  • unsafe: new function, Asm
  • cmd/compile/internal/gc: unsafe.Asm typechecking and SSA generation
  • cmd/compile/internal/ssa: changes to regalloc plus new asm-related ops
  • cmd/compile/internal/amd64: code generation for unsafe.Asm-generated ops
  • cmd/asm/internal: parser is used to parse asm operand strings

Additional notes

Initial implementation prototype gives 85-100% of hand-written assembly code performance.
There is some room for improvements, especially for the memory operations, which
can bump lower bound closer to 90-95%. The remaining performance difference is mostly
due to advanced branching tricks used in some assembly code and more efficient
code layout/registers usage.

Open questions

How to express write-only destination operands to avoid extra zeroing?

Proposed solution: ?

What about gccgo and other Go implementations?

Proposed solution: we can probably start by not permitting unsafe.Asm inside compilers that do not support it.

How to express multi-output instructions?

Proposed solution A: interpret []interface{} argument as a multi-value destination.

var quo, rem uint8
// Note that IDIV expects first operand to be in AX.
unsafe.Asm("MOVB", "AX", uint8(x))
unsafe.Asm("IDIV", []interface{}{quo, rem}, uint8(y))
// AL is moved to quo.
// AH is moved to rem.

Note that []interface{} causes no allocations and is consumed during the compile time.

This is consistent with a way how unsafe.Sizeof works.

Proposed solution B: add unsafe.Asm2 function that has 2 destination arguments.

func Asm2(opcode string, dst1, dst2 interface{}, args ...interface{})

Activity

added this to the Proposal milestone on Aug 9, 2018
cznic

cznic commented on Aug 9, 2018

@cznic
Contributor

If considered to be accepted, I think the signature should be

func Asm(string) error
ghost

ghost commented on Aug 9, 2018

@ghost

@cznic Why should there be a return error value? In what cases would an error be deferred from compile time to run time?

cznic

cznic commented on Aug 9, 2018

@cznic
Contributor

Scratch the return value in my post, IDK what I was thinking. What I really wanted to say is that the arguments and all variations of arguments (Asm2, Asm3, ...) should be replaced by just a string. There are more things that are needed in assembler code than just instructions. For example directives, declarations and even comments are sometimes a must have.

quasilyte

quasilyte commented on Aug 9, 2018

@quasilyte
ContributorAuthor

@cznic for single string argument, I have these questions:

  1. How to determine dst argument? There can be 0, 1 or more of them. Without this info, it's impossible to model data flow properly in SSA regalloc.
  2. How to pass Go value into that code? I mean something like this: unsafe.Asm("LEAQ", "AX", &a[0]).

Note that most of the time, one can use Go variables without having to specifying registers.
The only notable exception is vector registers like X/Y/Z on AMD64. Programmer has to use them directly. For scalars and pointers, there no need to spell registers by names; regalloc will do that for you.

quasilyte

quasilyte commented on Aug 9, 2018

@quasilyte
ContributorAuthor

There are more things that are needed in assembler code than just instructions. For example directives, declarations and even comments are sometimes a must have.

This is out of scope of this proposal.
At least this was my initial goal: make it possible to use SIMD inside Go loops without having to write whole function in asm.

Another important case is getting rid of special treatment of intrinsified functions inside the compiler.

and even comments are sometimes a must have.

Just use Go comments.
Single unsafe.Asm encodes single instruction.

as

as commented on Aug 9, 2018

@as
Contributor

How will this proposal ensure that the assembly is correct at compile time rather than run time? Across architectures?

make it possible to use SIMD inside Go loops without having to write whole function in asm.

I think containment is extremely useful when dealing with platform-specific code. How does the feature benefit the maintainer of the codebase? It is easy to tell where an assembly function is called, whereas in this scenario it would be difficult to see where it is being used.

I'm confused about the end goal. We would use this inside of loops, so we don't have to use them inside pure assembly functions? I would rather have a function that implements the loop inside of it rather than invoke the instructions within the loop. Are there any other advantages of doing it this way other than convenience for the writer?

quasilyte

quasilyte commented on Aug 9, 2018

@quasilyte
ContributorAuthor

How will this proposal ensure that the assembly is correct at compile time rather than run time?

What do you mean by "assembly is correct"?
If you mean correct as in assembly code, just "assembles correctly", then it's the asm backend responsibility. The unsafe.Asm produces SSA value that is turned into matching obj.Prog object after optimization passes. These are handled by the asm backend as usual.

Across architectures?

Could you clarify, please?
The unsafe.Asm is as portable as normal asm (read: not portable at all). If one wants several implementations inside one loop, it's still possible to wrap a SIMD instruction calls into a function (that function will be inlined, so no performance penalties there).

It's possible to write portable 3-rd party library that gives such primitives as cross-platform SIMD operations. The advantage is that they can be inlineable, so this makes them more composable than pure asm alternatives (user always pays for the function call).

Are there any other advantages of doing it this way other than convenience for the writer?

Making it possible to get rid of "intrinsics" from the compiler and make it possible to implement them without so much special casing.

as

as commented on Aug 9, 2018

@as
Contributor

What do you mean by "assembly is correct"?

For context, this is where it was unclear:

This function is the low level mechanism for Go programmers to inject machine-dependent code right into the function body at the unsafe.Asm call site.

If I have an assembly function that contains an invalid or unsupported instruction, and I run go build. I will get an error and no binary will be produced. If the same scenario occurs in this proposal, what will happen when the user runs go build?

billotosyr

billotosyr commented on Aug 9, 2018

@billotosyr

Inline asm is a bad idea in my opinion. In C/C++ it leads to run-on sections like..
#elif defined(i386)
asm ...
#elif defined(x86_64) || defined(amd64)
asm ...
#elif defined(powerpc) || defined(ppc)
asm ...
#elif defined(s390x)
asm ...
#elif defined(sparc)
asm ...
#elif defined(ia64)
asm ...

You indicated that you can protect the code with a build tag, but that only means users of other architectues won't have access to the code at all. In truth, most of the time the inline asm will only be written for amd64, which will make for huge porting problems to other architectures.

The way things are now, asm is really only used (other than in the go runtime itself) for accellerating code that has already been written in Go. Becuase it's written in Go it's portable. Inline asm will destroy the admirable portability of the Go language.

It also destroys readability.

quasilyte

quasilyte commented on Aug 9, 2018

@quasilyte
ContributorAuthor

If I have an assembly function that contains an invalid or unsupported instruction, and I run go build. I will get an error and no binary will be produced. If the same scenario occurs in this proposal, what will happen when the user runs go build?

All errors happen in the same way, unsafe.Asm("FOO", nil) results in invalid instruction during go build. Same for invalid arguments.

Suppose this is the compilation pipeline:

compiler FE -> compiler BE -> assembler

The unsafe.Asm is replaced with OpAsm SSA value during the FE->BE transition (gc/ssa.go),
this catches invalid opcodes.

After BE finishes optimizations and lowering, BE->assembler transformation produces obj.Prog lists, these are then verified by the asm backends. This catches all other errors like invalid arguments combinations, etc.

as

as commented on Aug 9, 2018

@as
Contributor

Does anything prevent a user from separating the opcode from the call string by using a constant, such as:

const myInstruction = "MOVQ"

TocarIP

TocarIP commented on Aug 9, 2018

@TocarIP
Contributor

@billotosyr you already can write asm-only function without any go fallback, but I don't think this happens now.

quasilyte

quasilyte commented on Aug 9, 2018

@quasilyte
ContributorAuthor

Does anything prevent a user from separating the opcode from the call string by using a constant, such as

In the prototype I've rolled, no. Any constant string will do.
I believe this property does not make things worse.

The intention is to provide very minimalistic API that makes it possible to write a less error-prone intrinsic-like library as a 3-rd party package. For MOVQ, we can have these signatures:

package x86
func Mov64(dst, src interface{}) {
  unsafe.Asm("MOVQ", dst, src)
}

The other way is to provide named constants in github.com/foobar/x86 package:

package x86
const Mov64 = "MOVQ"

The other benefits came to my mind:

  • It's easier to do static code analysis inside Go code. unsafe.Asm has quite straightforward signature and can be verified for semantics with tools like staticcheck.
  • We could implement auto loop vectorization with this, using code generation. Without gc compiler support, that is.

21 remaining items

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @rsc@cznic@dave@docmerlin@agnivade

        Issue actions

          proposal: unsafe: inline assembly with unsafe.Asm function · Issue #26891 · golang/go