[L2 Space] Improving performance when dimension is not a factor of 4 or 16 #131

2ooom · 2019-07-30T14:07:03Z

Processing 8 values at once and finishing computation by non-vectorized instructions in case dim % 8 != 0

…or 16 Processing 8 values at once and finishing computation by non-vectorized instructions in case dim % 8 != 0

yurymalkov · 2019-07-30T17:41:45Z

Hi @2ooom,
Thanks for the PR!

A long time ago I've done the performance tests and it turns out that the computation of the residual does contribute to the total run time even if dim%16==0 (no residual). So, I would like to keep the conditions of the exactness along with the others for the best performance in this case.

This probably can be done without much duplicate coding by using templates with bools in L2SqrSIMD16Ext (e.g. if dim%16==0 then set a bool to skip the tail at the compile time; if dim%16!=0 then process the tail).

Can you please update the PR to accommodate that?

…eaded

* temp debug state * fix bug in loading index with deleted elements * adjust condition in test * add check for file existence * cleanup

addPoint(void*, args...) to addPoint(const void*, args...). The changes include the interface in bruteforce.h, and all interfaces related to addPoint in hnswlib. I test the code using 1 million sift data, the result is ok.

1. searchKnn will fist check if the graph is empty 2. searchKnn will return a min-heap The test code in sift_1b is changed and tested.

…be virtual. I modified the sift_1b(not commited) to test the new interface, the result is ok. Test result of sift_1b on 1 million data. Loading GT: Loading queries: Loading index from sift1b_1m_ef_40_M_16.bin: Actual memory usage: 417 Mb Parsing gt: 10000 Loaded gt 1 0.2371 13.319 us 2 0.3712 15.691 us 3 0.4615 18.5166 us 4 0.5273 20.6371 us 5 0.5758 22.1235 us 6 0.6179 24.4141 us 7 0.6502 25.9906 us 8 0.6796 28.2004 us 9 0.7042 29.8559 us 10 0.7243 31.3286 us 11 0.7432 36.0276 us 12 0.7605 34.9448 us 13 0.7754 36.4176 us 14 0.7874 37.7606 us 15 0.8013 44.6698 us 16 0.8116 47.4424 us 17 0.8239 46.9154 us 18 0.8312 45.9322 us 19 0.8379 49.3406 us 20 0.8442 49.124 us 21 0.8507 52.1223 us 22 0.8566 52.4161 us 23 0.8622 56.9665 us 24 0.8675 71.5782 us 25 0.8731 72.4451 us 26 0.8768 57.0935 us 27 0.8812 58.3525 us 28 0.8845 59.5751 us 29 0.889 61.7516 us 30 0.8935 62.6091 us 40 0.9224 76.8735 us 50 0.9412 92.5431 us 60 0.9541 107.141 us 70 0.9632 121.24 us 80 0.9708 135.862 us 90 0.9756 163.516 us 100 0.9792 180.539 us 140 0.9883 228.747 us 180 0.9921 281.199 us 220 0.9942 338.32 us 260 0.9956 388.501 us 300 0.9962 445.776 us 340 0.9968 477.474 us 380 0.9975 534.054 us 420 0.9982 582.327 us 460 0.9983 625.824 us Actual memory usage: 419 Mb

…ed by the user.

L2 SIMD methods are split in 2: 1. `L2SqrSIMD(4|16)Ext` - uses SSE or AVX to compute distance on dimensions that are multiples of 4 and 16 2. `L2SqrSIMD(4|16)ExtResidual` - relies on (1) to compute full multiples of 4 and 16 dimensions and finishes residual computation by relying on non-SIMD method `L2Sqr`.

[L2 Space] Improving performance when dimension is not a factor of 4 …

2bf4d13

…or 16 Processing 8 values at once and finishing computation by non-vectorized instructions in case dim % 8 != 0

piem and others added 28 commits April 5, 2020 12:53

setExternalLabel returns void

1bdbe2d

use size_t counters to avoid size_t to int comparisons

5f10af8

Update README.md

e180326

fix bug in sift test

34b142d

update bruteforce to support element updates, add locks for multi-thr…

ce80e99

…eaded

pypi package

af0007c

travis installation

bc91a93

include hsnwlib in sdist

2195112

removebdist_wheel from distribution

d58b9a8

use symlink

a6b87f2

remove unuseful cp (and relaunch tests)

077b041

Update README.md

86188b1

Update README.md

f47e853

fix/improve tests, #142

d6d204f

Fix load bugs/messages, update test, deprecate old indices (#148)

76db8ae

* temp debug state * fix bug in loading index with deleted elements * adjust condition in test * add check for file existence * cleanup

The interface addPoint is changed from

6c4ab29

addPoint(void*, args...) to addPoint(const void*, args...). The changes include the interface in bruteforce.h, and all interfaces related to addPoint in hnswlib. I test the code using 1 million sift data, the result is ok.

Two main changes:

0334c8c

1. searchKnn will fist check if the graph is empty 2. searchKnn will return a min-heap The test code in sift_1b is changed and tested.

Remove unneeded header <queue>

16c1175

Fix typos

2b400f3

Using std::sort to sort the result according to the comparator provid…

5e037e2

…ed by the user.

fix missing deletion initialization

dc25836

Expose current-count and max-elements in Python

65f35ac

fix python tests

d5f6aad

Throw exception on malloc fails

bf94915

Update README.md

ef1d4e0

updated path from static to dynamic

b75c713

bump version

83b635d

xiejianqiao and others added 4 commits April 5, 2020 12:53

fix，overflow in getIdsList

aae3be9

include one more other implementation

679903c

Update README.md

fd4ebf4

2ooom closed this Apr 5, 2020

2ooom mentioned this pull request Apr 5, 2020

Perf improvement for dimension not of factor 4 and 16 #211

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[L2 Space] Improving performance when dimension is not a factor of 4 or 16 #131

[L2 Space] Improving performance when dimension is not a factor of 4 or 16 #131

Uh oh!

2ooom commented Jul 30, 2019

Uh oh!

yurymalkov commented Jul 30, 2019

Uh oh!

Uh oh!

[L2 Space] Improving performance when dimension is not a factor of 4 or 16 #131

[L2 Space] Improving performance when dimension is not a factor of 4 or 16 #131

Uh oh!

Conversation

2ooom commented Jul 30, 2019

Uh oh!

yurymalkov commented Jul 30, 2019

Uh oh!

Uh oh!