Does hnsw support adding items one by one in real-time? #2

willard-yuan · 2018-03-27T15:42:41Z

Hi yurymalkov,

Thank you for your great work, and the performance shows in the paper is interesting. The python bindings shows the index can be built with incremental construction. Does the hnsw support adding items one by one? i.e. adding items one by one in real-time to the index dynamically.

yurymalkov · 2018-03-27T19:03:22Z

@willard-yuan Yes, the binding supports adding elements one by one, and should work in parallel with python threads.

At the moment you need to add a dummy dimension for data and labels (i.e. call np.expand_dims(..,axis=0) on your vector and labels before passing them), as binding currently expects the input to be a matrix.
Also you should specify the parameter num_threads=1 to avoid creating threads, i.e.
int_labels = p.add_items(np.expand_dims(vector,axis=0), np.expand_dims(label,axis=0), num_threads=1)
I think I'll have time to fix both of these inconveniences within a week.

willard-yuan · 2018-03-28T01:23:04Z

@yurymalkov Thank you for your reply. I plan to use hnsw for face retrieval.

lzuwei · 2018-03-30T15:05:49Z

Hi @yurymalkov, thank you for the your great work as well,
I have a few questions as well for this implementation.

is the performance of this similar to nmslib?
based on the above discussion, am I right to say this implementation supports incremental indexing?
in the example provided, you explicitly mention a need to specify max number of elements, what is this for?

# Initing index - the maximum number of elements should be known beforehand
p.init_index(max_elements = num_elements, ef_construction = 200, M = 16)

yurymalkov · 2018-03-30T20:17:29Z

Hi @lzuwei

The search performance is generally on par with the nmslib's implementation. On typical test datasets (i.e. sift, glove) it is a bit slower at high recalls compared to using nmslib with post='2' (not present in hnswlib because of the incremental construction), but also a bit faster at low recalls. Note that there are clear ways how to improve, but they require some coding work.
See below a sift 1M plot with the closest competitors. Note that I am not completely sure if I have set all of the faiss parameters right.

The good news is that the index time and indexing memory consumption are way better than in the nmslib, especially compared to post='2' case (the improvement in this case is ~3-4X).
2. Yes
3. The index stores the data in a single chunk. The chunk is allocated during the index initialization. If the number of data elements exceeds the predefined maximum, the chunk has to be extended (i.e. reallocated), but this functionality has not been implemented yet.

danieloliveirabrazil · 2018-06-15T18:42:27Z

how to use "string" as label in "add_items" ?

willard-yuan · 2018-06-18T02:35:33Z

@danieloliveirabrazil You can use a map <int, string> to record the label when an item is added to the index.

394781865 · 2018-06-29T08:26:26Z

when we can use Hamming distance? sorry! I'm english language is not well.

394781865 · 2018-06-29T08:30:03Z

if hnsw use Squared L2 distance and lsh use Hamming distance ,who will be faster ? sorry! I'm Chinese student!

yurymalkov · 2018-06-29T19:25:22Z

Hi @394781865,
The library does not currently support Hamming distance.
You should probably use https://github.com/nmslib/nmslib which supports it (at least, in C++).
Concerning your problem - usually LSH performs much worse, but I am not sure this is universal.

searchivarius · 2018-06-29T19:38:35Z

@394781865 yes, NMSLIB supports the hamming distance, you can use the following example https://github.com/nmslib/nmslib/blob/master/python_bindings/tests/legacy_test.py#L159
with a few changes:

The space name should be bit_hamming
Data points are fed as strings containing space-separated zeros and ones.
All bit vectors should have equal number of dimensions, i.e., you can't compare 100-d binary vectors with 200-d binary vectors.

Add length tracking for normalized vectors

phdowling mentioned this issue May 23, 2018

Possible to continue setting data after building index? nmslib/nmslib#73

Open

wangty6 mentioned this issue Jun 8, 2018

I have a core-dump back trace, what's the possible reason? #26

Closed

searchivarius closed this as completed Jun 29, 2018

sameraamar mentioned this issue Dec 9, 2021

The HNSW graph search returns less than the world size (so recall can never be 1.0) ? #352

Open

duylncanawan mentioned this issue Nov 25, 2023

segmentation fault (core dumped) #525

Closed

GerHobbelt pushed a commit to GerHobbelt/hnswlib that referenced this issue May 10, 2024

Merge pull request nmslib#2 from chroma-core/hammad/add_length_tracking

d083259

Add length tracking for normalized vectors

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Does hnsw support adding items one by one in real-time? #2

Does hnsw support adding items one by one in real-time? #2

willard-yuan commented Mar 27, 2018

yurymalkov commented Mar 27, 2018

Uh oh!

willard-yuan commented Mar 28, 2018

Uh oh!

lzuwei commented Mar 30, 2018

Uh oh!

yurymalkov commented Mar 30, 2018

Uh oh!

danieloliveirabrazil commented Jun 15, 2018

Uh oh!

willard-yuan commented Jun 18, 2018

Uh oh!

394781865 commented Jun 29, 2018

Uh oh!

394781865 commented Jun 29, 2018

Uh oh!

yurymalkov commented Jun 29, 2018

Uh oh!

searchivarius commented Jun 29, 2018

Uh oh!

Does hnsw support adding items one by one in real-time? #2

Does hnsw support adding items one by one in real-time? #2

Comments

willard-yuan commented Mar 27, 2018

yurymalkov commented Mar 27, 2018

Uh oh!

willard-yuan commented Mar 28, 2018

Uh oh!

lzuwei commented Mar 30, 2018

Uh oh!

yurymalkov commented Mar 30, 2018

Uh oh!

danieloliveirabrazil commented Jun 15, 2018

Uh oh!

willard-yuan commented Jun 18, 2018

Uh oh!

394781865 commented Jun 29, 2018

Uh oh!

394781865 commented Jun 29, 2018

Uh oh!

yurymalkov commented Jun 29, 2018

Uh oh!

searchivarius commented Jun 29, 2018

Uh oh!