Skip to content

Avoids pointer overflow in bound checking of SSE/AVX intrinsics #980

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
briancylui opened this issue Sep 21, 2018 · 0 comments
Open

Avoids pointer overflow in bound checking of SSE/AVX intrinsics #980

briancylui opened this issue Sep 21, 2018 · 0 comments
Labels
P2 Priority of the issue for triage purpose: Needs to be fixed at some point.

Comments

@briancylui
Copy link
Contributor

briancylui commented Sep 21, 2018

#821 references and aims to solve this issue.

Suggested by @ahsonkhan to avoid integer overflow in bound checking inside SSE/AVX intrinsics implementation, i.e. change all while (pCurrent + 8 OR 4 <= pEnd) into while (pEnd - pCurrent >= 8 OR 4).

Perf tests results before and after the change are shown below:

Before the change:

BenchmarkDotNet=v0.11.1, OS=Windows 10.0.17134.228 (1803/April2018Update/Redstone4)
Intel Core i7-7700 CPU 3.60GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.0.100-alpha1-20180720-2
  [Host] : .NET Core 3.0.0-preview1-26710-03 (CoreCLR 4.6.26710.05, CoreFX 4.6.26708.04), 64bit RyuJIT

Toolchain=InProcessToolchain
Type Method Mean Error StdDev
AvxPerformanceTests SumU 159.4 us 1.104 us 0.9784 us
NativePerformanceTests SumU 283.5 us 5.492 us 4.8687 us
SsePerformanceTests SumU 281.2 us 1.472 us 1.3045 us
AvxPerformanceTests AddU 276.1 us 3.018 us 2.520 us
NativePerformanceTests AddU 330.1 us 3.585 us 3.178 us
SsePerformanceTests AddU 325.6 us 6.883 us 7.926 us

After the change:

BenchmarkDotNet=v0.11.1, OS=Windows 10.0.17134.228 (1803/April2018Update/Redstone4)
Intel Core i7-7700 CPU 3.60GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.0.100-alpha1-20180720-2
  [Host] : .NET Core 3.0.0-preview1-26710-03 (CoreCLR 4.6.26710.05, CoreFX 4.6.26708.04), 64bit RyuJIT

Toolchain=InProcessToolchain
Type Method Mean Error StdDev
AvxPerformanceTests SumU 183.5 us 3.621 us 3.023 us
NativePerformanceTests SumU 281.6 us 5.261 us 4.921 us
SsePerformanceTests SumU 294.1 us 2.080 us 1.946 us
AvxPerformanceTests AddU 296.3 us 5.185 us 4.850 us
NativePerformanceTests AddU 335.1 us 3.053 us 2.707 us
SsePerformanceTests AddU 345.0 us 2.155 us 1.800 us

Both SSE and AVX implementations are slower by 10-20% after this change.

In my opinion, after seeing the perf results, I may not recommend merging this PR. I may wait until the alternative suggested by @tannergooding in an earlier PR review has been implemented (2nd item under "Functionality" in briancylui#2):

var remainder = count % elementsPerIteration;
float* pEnd = pdst + (count - remainder);
while (pDstCurrent < pEnd)
{ … }

Another question I have is: would pDstCurrent + 8 OR 4 ever have the possibility to result in integer overflow? According to my knowledge, pEnd is initialized as pDstCurrent + count, and there are Contract.Asserts in the wrapper class to check that count does not exceed the original array length. I'm not sure, and am open to any PR comments and advice.

cc: @danmosemsft @eerhardt @tannergooding @ahsonkhan @markusweimer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 Priority of the issue for triage purpose: Needs to be fixed at some point.
Projects
None yet
Development

No branches or pull requests

2 participants