Skip to content

LLVM memory builtins #995

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 27, 2020
Merged

LLVM memory builtins #995

merged 2 commits into from
Mar 27, 2020

Conversation

aykevl
Copy link
Member

@aykevl aykevl commented Mar 26, 2020

Use LLVM intrinsics for memory operations (memset, memcpy, memmove). This gives more information to the optimizer and in the end lowers to libc functions instead of a custom TinyGo implementation. This is possible since #871 and #845.

Another motivation for this PR is to reduce the amount of code at the end of the compiler.Compile function, which should help with eventually breaking it down further into per-package compiles (#285).

aykevl added 2 commits March 26, 2020 16:09
This replaces the custom runtime.memcpy and runtime.memmove functions
with calls to LLVM builtins that should hopefully allow LLVM to better
optimize such calls. They will be lowered to regular libc memcpy/memmove
when they can't be optimized away.

When testing this change with some smoke tests, I found that many smoke
tests resulted in slightly larger binary sizes with this commit applied.
I looked into it and it appears that machine.sendUSBPacket was not
inlined before while it is with this commit applied. Additionally, when
I compared all driver smoke tests with -opt=1 I saw that many were
reduced slightly in binary size and none increased in size.
This gives the optimizer a bit more information about what the calls do.
This should result in slightly better generated code.

Code size sometimes goes up and sometimes goes down. I blame the code
size going up on the inliner which inlines more functions, because
compiling the smoke tests in the drivers repository with -opt=1 results
in a slight code size reduction in all cases.
@deadprogram
Copy link
Member

This absolutely seems like a much better idea to me.

However, I am noticing that all of the programs are now a little larger, but a couple of them are now much larger than I would have expected.

Before this PR:

tinygo build -size short -o test.hex -target=itsybitsy-m0        examples/blinky1
   code    data     bss |   flash     ram
   9172       8    4312 |    9180    4320

After this PR:

tinygo build -size short -o test.hex -target=itsybitsy-m0        examples/blinky1
   code    data     bss |   flash     ram
   9548       8    4312 |    9556    4320

What do you think?

@aykevl
Copy link
Member Author

aykevl commented Mar 27, 2020

It definitely is machine.sendUSBPacket.

Before (at c4fd19b):

   code    data     bss |   flash     ram
   9380       8    4312 |    9388    4320

With this PR:

   code    data     bss |   flash     ram
   9548       8    4312 |    9556    4320

With this PR and //go:noinline on machine.sendUSBPacket:

   code    data     bss |   flash     ram
   9308       8    4312 |    9316    4320

So it definitely can reduce code size, with a hint to the inliner. I think the inliner considers the LLVM builtin to be of a lower cost than a regular function call, although it is often translated to a function call in the end. Because of this lower cost, the inliner decided that machine.sendUSBPacket is just below the limit and decided to inline the function at all 11 places.
So, what do you think? I can add a commit to stop that function from being inlined.


Sidenote: Clang does this memory builtin translation too. When you call memcpy for example, it will emit a call to llvm.memcpy.<something> instead of a regular memcpy call.

@deadprogram
Copy link
Member

Very good investigative work to figure out the root cause. Sounds like a good solution to me.

So I assume that be a separate PR, in which case should I should merge this one when that one is ready?

@aykevl
Copy link
Member Author

aykevl commented Mar 27, 2020

Sounds good to me! It's not a functional regression and the fix is trivial.

@deadprogram
Copy link
Member

Also now merging this. Good work!

@deadprogram deadprogram merged commit 91d1a23 into dev Mar 27, 2020
@deadprogram deadprogram deleted the llvm-mem-calls branch March 27, 2020 20:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants