-
Notifications
You must be signed in to change notification settings - Fork 712
[L2 Space] Improving performance when dimension is not a factor of 4 or 16 #131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…or 16 Processing 8 values at once and finishing computation by non-vectorized instructions in case dim % 8 != 0
Hi @2ooom, A long time ago I've done the performance tests and it turns out that the computation of the residual does contribute to the total run time even if This probably can be done without much duplicate coding by using templates with bools in Can you please update the PR to accommodate that? |
* temp debug state * fix bug in loading index with deleted elements * adjust condition in test * add check for file existence * cleanup
addPoint(void*, args...) to addPoint(const void*, args...). The changes include the interface in bruteforce.h, and all interfaces related to addPoint in hnswlib. I test the code using 1 million sift data, the result is ok.
1. searchKnn will fist check if the graph is empty 2. searchKnn will return a min-heap The test code in sift_1b is changed and tested.
…be virtual. I modified the sift_1b(not commited) to test the new interface, the result is ok. Test result of sift_1b on 1 million data. Loading GT: Loading queries: Loading index from sift1b_1m_ef_40_M_16.bin: Actual memory usage: 417 Mb Parsing gt: 10000 Loaded gt 1 0.2371 13.319 us 2 0.3712 15.691 us 3 0.4615 18.5166 us 4 0.5273 20.6371 us 5 0.5758 22.1235 us 6 0.6179 24.4141 us 7 0.6502 25.9906 us 8 0.6796 28.2004 us 9 0.7042 29.8559 us 10 0.7243 31.3286 us 11 0.7432 36.0276 us 12 0.7605 34.9448 us 13 0.7754 36.4176 us 14 0.7874 37.7606 us 15 0.8013 44.6698 us 16 0.8116 47.4424 us 17 0.8239 46.9154 us 18 0.8312 45.9322 us 19 0.8379 49.3406 us 20 0.8442 49.124 us 21 0.8507 52.1223 us 22 0.8566 52.4161 us 23 0.8622 56.9665 us 24 0.8675 71.5782 us 25 0.8731 72.4451 us 26 0.8768 57.0935 us 27 0.8812 58.3525 us 28 0.8845 59.5751 us 29 0.889 61.7516 us 30 0.8935 62.6091 us 40 0.9224 76.8735 us 50 0.9412 92.5431 us 60 0.9541 107.141 us 70 0.9632 121.24 us 80 0.9708 135.862 us 90 0.9756 163.516 us 100 0.9792 180.539 us 140 0.9883 228.747 us 180 0.9921 281.199 us 220 0.9942 338.32 us 260 0.9956 388.501 us 300 0.9962 445.776 us 340 0.9968 477.474 us 380 0.9975 534.054 us 420 0.9982 582.327 us 460 0.9983 625.824 us Actual memory usage: 419 Mb
L2 SIMD methods are split in 2: 1. `L2SqrSIMD(4|16)Ext` - uses SSE or AVX to compute distance on dimensions that are multiples of 4 and 16 2. `L2SqrSIMD(4|16)ExtResidual` - relies on (1) to compute full multiples of 4 and 16 dimensions and finishes residual computation by relying on non-SIMD method `L2Sqr`.
Processing 8 values at once and finishing computation by non-vectorized instructions in case dim % 8 != 0