Skip to content

Commit a253314

Browse files
authored
Update README.md
1 parent 41b464b commit a253314

File tree

1 file changed

+31
-20
lines changed

1 file changed

+31
-20
lines changed

README.md

Lines changed: 31 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,19 @@
11
# HNSW - Approximate nearest neighbor search
2-
Paper code for the HNSW 200M SIFT experiment and a header-only C++ HNSW implementation with python bindings.
2+
Header-only C++ HNSW implementation with python bindings. Paper code for the HNSW 200M SIFT experiment
33

4-
NEW: Added simple python bindings with incremental construction
4+
NEW: Added support for cosine similarity and inner product distances
55

66

7-
#### Test reproduction steps:
8-
To download and extract the bigann dataset:
9-
```bash
10-
python3 download_bigann.py
11-
```
12-
To compile:
13-
```bash
14-
cmake .
15-
make all
16-
```
7+
Part of the nmslib project https://github.com/nmslib/nmslib
178

18-
To run the test on 200M SIFT subset:
19-
```bash
20-
./main
21-
```
229

23-
The size of the bigann subset (in millions) is controlled by the variable **subset_size_milllions** hardcoded in **sift_1b.cpp**.
10+
11+
Supported distances:
12+
1) Squared L2 ('l2')
13+
2) Inner product ('ip',the distance is 1.0 - $inner product$)
14+
3) Cosine similarity ('cosine', the same as the inner product, but vectors are normalized)
15+
16+
For other spaces use the main library https://github.com/nmslib/nmslib
2417

2518

2619
#### Python bindings example
@@ -36,7 +29,7 @@ data = np.float32(np.random.random((num_elements, dim)))
3629
data_labels = np.arange(num_elements)
3730

3831
# Declaring index
39-
p = hnswlib.Index(space = 'l2', dim = dim) # Only l2 is supported currently
32+
p = hnswlib.Index(space = 'l2', dim = dim) # possible options are l2, cosine or ip
4033

4134
# Initing index - the maximum number of elements should be known beforehand
4235
p.init_index(max_elements = num_elements, ef_construction = 200, M = 16)
@@ -45,7 +38,7 @@ p.init_index(max_elements = num_elements, ef_construction = 200, M = 16)
4538
int_labels = p.add_items(data, data_labels)
4639

4740
# Controlling the recall by setting ef:
48-
p.set_ef(50)
41+
p.set_ef(50) # ef should always be > k
4942

5043
# Query dataset, k - number of closest elements (returns 2 numpy arrays)
5144
labels, distances = p.knn_query(data, k = 1)
@@ -57,7 +50,25 @@ cd python_bindings
5750
python3 setup.py install
5851
```
5952

60-
The repo contrains parts of the Non-Metric Space Library's code https://github.com/searchivarius/nmslib
53+
#### 200M SIFT test reproduction steps:
54+
To download and extract the bigann dataset:
55+
```bash
56+
python3 download_bigann.py
57+
```
58+
To compile:
59+
```bash
60+
cmake .
61+
make all
62+
```
63+
64+
To run the test on 200M SIFT subset:
65+
```bash
66+
./main
67+
```
68+
69+
The size of the bigann subset (in millions) is controlled by the variable **subset_size_milllions** hardcoded in **sift_1b.cpp**.
70+
71+
6172

6273
References:
6374
Malkov, Yu A., and D. A. Yashunin. "Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs." arXiv preprint arXiv:1603.09320 (2016).

0 commit comments

Comments
 (0)