-
Notifications
You must be signed in to change notification settings - Fork 951
LLVM memory builtins #995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LLVM memory builtins #995
Conversation
This replaces the custom runtime.memcpy and runtime.memmove functions with calls to LLVM builtins that should hopefully allow LLVM to better optimize such calls. They will be lowered to regular libc memcpy/memmove when they can't be optimized away. When testing this change with some smoke tests, I found that many smoke tests resulted in slightly larger binary sizes with this commit applied. I looked into it and it appears that machine.sendUSBPacket was not inlined before while it is with this commit applied. Additionally, when I compared all driver smoke tests with -opt=1 I saw that many were reduced slightly in binary size and none increased in size.
This gives the optimizer a bit more information about what the calls do. This should result in slightly better generated code. Code size sometimes goes up and sometimes goes down. I blame the code size going up on the inliner which inlines more functions, because compiling the smoke tests in the drivers repository with -opt=1 results in a slight code size reduction in all cases.
This absolutely seems like a much better idea to me. However, I am noticing that all of the programs are now a little larger, but a couple of them are now much larger than I would have expected. Before this PR:
After this PR:
What do you think? |
It definitely is Before (at c4fd19b):
With this PR:
With this PR and
So it definitely can reduce code size, with a hint to the inliner. I think the inliner considers the LLVM builtin to be of a lower cost than a regular function call, although it is often translated to a function call in the end. Because of this lower cost, the inliner decided that Sidenote: Clang does this memory builtin translation too. When you call |
Very good investigative work to figure out the root cause. Sounds like a good solution to me. So I assume that be a separate PR, in which case should I should merge this one when that one is ready? |
Sounds good to me! It's not a functional regression and the fix is trivial. |
Also now merging this. Good work! |
Use LLVM intrinsics for memory operations (
memset
,memcpy
,memmove
). This gives more information to the optimizer and in the end lowers to libc functions instead of a custom TinyGo implementation. This is possible since #871 and #845.Another motivation for this PR is to reduce the amount of code at the end of the
compiler.Compile
function, which should help with eventually breaking it down further into per-package compiles (#285).