Skip to content

Same code but poor recall in different machine #317

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
sujigrena opened this issue May 29, 2021 · 3 comments
Open

Same code but poor recall in different machine #317

sujigrena opened this issue May 29, 2021 · 3 comments

Comments

@sujigrena
Copy link

Hi,

I tried to create an index for around 60k embeddings of size 128 in a highend machine with 36 cores (with parameters ef_construction=300, M=32, ef=10). The index file generated was around 42 MB. The recall was around 95%. When I used the same code in a low end machine with 4 cores, the recall dropped to 80%. Changing hnsw parameters didn't help much. The size of the index file generated was just 5MB. Can you please help me understand this difference and how to stabilize the index creation across different hardware.

@yurymalkov
Copy link
Member

Hm. That sounds strange. The only thing I can imagine is that floating point/integer numbers are represented differently in the memory as hnswlib saves the memory.
Can you provide the details on the used machines (architecture)?

@sujigrena
Copy link
Author

Thanks for reply @yurymalkov . Adding the architecture from both machines:
high end machine: Linux user 4.15.0-135-generic #139-Ubuntu SMP Mon Jan 18 17:38:24 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

low end machine: Linux user 4.19.167+ #1 SMP Tue Feb 2 07:59:55 PST 2021 x86_64 GNU/Linux

@scott-weitze
Copy link

This behavior is to be expected as an artifact of how the graph is constructed in parallel. It is important to note that the two graphs will be different when constructing each of them in parallel. The way to guarantee that the two graphs are exactly the same is to construct them on both machines using a singled thread for construction.

The worker threads that build the graph each individually add different elements to the graph. But for various reasons in the algorithm (maximal level of insertion, varying search complexity), the threads don't have equal workload, which in turn results in a change in the order in which the vertices are inserted.

I would be happy to elaborate more on why this is if you need a more technical description of why this is happening, or if my description was not clear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants