-
Notifications
You must be signed in to change notification settings - Fork 712
Question about the recall performance #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I am not sure I've understood you correctly. |
Hi yurymalkov, Recently, I have done evaluation of top@N performance: The k-closest elements from the groundtruth(gt) is set as 1, and the number of retrieved samples N ranges from 1 to 1000 (ef is set as 1000). the performance is as follows:
Increasing the number of retrieved samples has little effect to boost the performance. I think the reason is that the recall is very high (close to 100%) when the number of retrieved samples N is 1, so Increasing N has little effect to boost the performance. I will test it on large-scale face feature. |
Hi @willard-yuan, Thank you for the tests! I think such behavior can be expected. During a search the algorithm keeps internally a sorted list of ef current approximate nearest neighbors. When the stop condition is met (all neighbors of elements in the list are evaluated), the list is shrunk to N (or 'K' for the KNN notation) best elements to provide the output for users: https://github.com/nmslib/hnsw/blob/master/hnswlib/hnswalg.h#L713 . So, in our case the recall should not depend on N, unless ef is changed, because the initial internal list is sorted. It does depend slightly in our case though (N=1,2). Probably, because there are several elements in the dataset which are equally close at least to some query. |
@yurymalkov Thank for your explanation. I think I have got the main idea of the method. By the way, is there any more English slide or doc to give more detail of the method except the paper? |
@willard-yuan Unfortunately, I am not aware of other detail documents. A somewhat different description can be found in nmslib's manual https://github.com/nmslib/nmslib/blob/master/manual/manual.pdf but it very short. |
Hi @yurymalkov I have finished the experiment on CNN feature with 128 dimension. The recall is great: M:40,Mem: 211533 Mb, index file:
Thank you for your great work. |
@willard-yuan Cool! |
@willard-yuan Thanks for the plot and results. I'm also curious to know the parameters you've used for the benchmark and the time it took to build the index. |
@yurymalkov @Neltherion Sorry for later reply. The efConstruction is set as 80. Indexing 2b face vector takes one hour or more (I forgot the accurate time). The machine I used is as follows:
|
@willard-yuan Thanks for the update. Probably, the search speed/accuracy can be improved significantly if efConstruction is increased. |
@willard-yuan Thanks so much. I always see "you can pump up efConstruction" comments in the issues section but I don't know how much would be standard and how much would be overkill. is there a benchmark on efConstruction effects on a standard machine or we should just tune the parameters to get a grasp of their efficiency? |
@Neltherion efConstruction can be autotuned for optimal query speed to the point where increasing it further will not help. This feature will be added to the lib at some point. |
efConstruction and ef aren't the same parameters. Yury probably meant
efSearch and ef.
Regarding the parameter: efConstruction. Like many other empirically
selected parameters, there's no bulletproof way to know good values in
advance. Something in the range 10-200 is usually very reasonable.
…---
Leo
On Sun, May 6, 2018 at 2:53 AM, Yury Malkov ***@***.***> wrote:
@Neltherion <https://github.com/Neltherion> efConstruction can be
autotuned for optimal query speed to the point where increasing it further
will not help. This feature will be added to the lib at some point.
efConstruction and ef are basically the same parameter. So, when I see
that to get a high recall (0.95 in this case) one has to set ef to 1000,
while efConstruction is only 80 it means that the construction searches
were not accurate and thus the search time can be improved.
I.e. a good choice of efConstruction for optimal query time is when a
search with ef=efConstruction and k=M has recall close to 1.0 (i.e.
0.9-0.95).
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#3 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAyZMhGV3BcuMZ22ubrtJBv2UeRDHdPPks5tvp37gaJpZM4TB2TQ>
.
|
Update setup.py
Hi yurymalkov,
I'm sorry to disturb you again. I have a question about the recall performance evaluation in
sift_1b.cpp
. Insift_1b.cpp
, the number of returned sample for each query is 1, then for the samples at top K (top@K) in the ground truth, if the returned sample is in the samples at top K, it counts 1. I think there is another method which is more appropriate than this. This method is as follows:The number of retrieved samples for each query is K (top@K), and for the sample at top 1 (top@1) in the ground truth, we check whether it (the sample of ground truth at top 1) in the top@K of retrieved samples. If true, it counts 1.
I think this evaluation is better than the method adopted in
sift_1b.cpp
. Taking face retrieval as example, we want to get high recall rate. If the same face doesn't recall at top of 10, we can set the number of retrieved samples to big, such as top@50.How do you think about the evaluation method in the
sift_1b.cpp
and the method I show it on above?Looking forward to your reply.
The text was updated successfully, but these errors were encountered: