Open
Description
In #52182 (comment), @laboger reports that the fiat-crypto (#40171) code with @pmur's compiler improvements (https://go.dev/cl/393656) is within range of the assembly performance!
This is extremely impressive considering the fiat-crypto code also uses safer but slower complete formulas and a somewhat naive 4-bit scalar multiplication window.
ScalarBaseMult/P256 237µs ± 0% 52µs ± 0% -78.22% (p=1.000 n=1+1)
ScalarMult/P256 239µs ± 0% 213µs ± 0% -10.95% (p=1.000 n=1+1)
The ScalarBaseMult benchmark is still significantly slower, because the assembly uses a large precomputed table, while the fiat-crypto code just runs ScalarMult. This is very much fixable.
I will land the ScalarBaseMult optimization in the fiat-crypto code, and then we can remove the ppc64le assembly entirely!
Metadata
Metadata
Assignees
Type
Projects
Milestone
Relationships
Development
No branches or pull requests
Activity
FiloSottile commentedon May 4, 2022
https://go.dev/cl/404174 is the promised ScalarBaseMult optimization, so it's possible that the assembly is now slower than the fiat-crypto code!
gopherbot commentedon May 5, 2022
Change https://go.dev/cl/404174 mentions this issue:
crypto/elliptic: precompute ScalarBaseMult doublings
crypto/elliptic: precompute ScalarBaseMult doublings
laboger commentedon May 11, 2022
Here are comparisons using noasm vs. asm using latest:
No meaningful difference in the crypto/tls benchmarks.
Looks like the assembler version is still significantly faster than the native Go version for some.