ROLTEX

Implementation of Robust Tree-based Learned Vector Index with Query-aware Repartitioning.

Prerequisites

Install dependencies:

conda create -n anns_env python=3.11 gcc make cmake tqdm swig mkl=2023 mkl-devel=2023 numpy scipy pytest cmake loguru tensorboard pytorch pytorch-cuda=11.8 -c conda-forge -c pytorch -c nvidia

Alternatively, you can install the dependencies from yml.

conda env create -n anns_env -f environment.yml

Install our modified faiss that supports add_preassigned & search_preassigned for IVF indexes(IndexIVF and IndexShardsIVF). Set CMAKE_CUDA_ARCHITECTURES, BLA_VENDOR and DFAISS_OPT_LEVEL below appropriately for your system. You can refer to the faiss documentation for details. (We are working to merge this feature into the main branch; see this issue).

conda activate anns_env
cd faiss_preassigned

cmake -B _build \
      -DBUILD_SHARED_LIBS=ON \
      -DBUILD_TESTING=OFF \
      -DFAISS_OPT_LEVEL=avx512 \
      -DFAISS_ENABLE_GPU=ON \
      -DFAISS_ENABLE_RAFT=OFF \
      -DCMAKE_CUDA_ARCHITECTURES=75 \
      -DFAISS_ENABLE_PYTHON=ON \
      -DBLA_VENDOR=Intel10_64lp \
      -DCMAKE_INSTALL_LIBDIR=lib \
      -DCMAKE_BUILD_TYPE=Release .

make -C _build -j$(nproc) faiss faiss_avx2 faiss_avx512 swigfaiss swigfaiss_avx2 swigfaiss_avx512

Install our compiled faiss into the conda environment created.

cd _build/faiss/python
pip install .

Run

All datasets, including database vectors, training query vectors, test query vectors, and ground truth, are stored in fvecs or bin format. Refer to dataset.py for details on dataset handling. The tree and neural network structures for the algorithms, along with training hyperparameters and their descriptions, can be found in config.py. Alternatively, you can use a JSON file for configuration, e.g. conf.json, specifying it at runtime.

python3 main.py -f conf.json -d cuda:0 -i cuda:0

This will train using a single GPU. The device specified on the command line will take precedence over the configuration file. Use the -d flag to specify the device for the neural network and training processes, and the -i flag to designate the device where the inverted lists are stored. Logs and checkpoints will be saved under ./logs/<DATASET_NAME>/<NAME_OF_THIS_RUN>/.

Evaluation

To evaluate a specific checkpoint, run the following command:

python3 main.py -v ./logs/<DATASET_NAME>/<NAME_OF_THIS_RUN>/<CHECKPONITS>

This command will evaluate the performance of the chosen checkpoint, display the results, and save them to a file named time_recall_<CURRENT_TIMESTAMP>_.json. You can specify an alternative configuration file using the -f option. Additionally, you can control the number of inverted lists to probe by using the --start, --stop, and --interval flags, which respectively specify the starting number, stopping number, and the interval for the range of inverted lists to evaluate, to balance of the latency and recall.

References

Please cite ROTLEX in your publications with the following bibtex if it helps your research:

@inproceedings{10.1145/3711896.3737112,
author = {Wei, Wenqing and Lian, Defu and Feng, Qingshuai and Wu, Yongji},
title = {Robust Tree-based Learned Vector Index with Query-aware Repartitioning},
year = {2025},
isbn = {9798400714542},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3711896.3737112},
doi = {10.1145/3711896.3737112},
booktitle = {Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2},
pages = {3134–3143},
numpages = {10},
keywords = {approximate nearest neighbor search (anns), learning-to-index, maximum inner product search (mips), vector retrieval},
location = {Toronto ON, Canada},
series = {KDD '25}
}

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
baselines		baselines
faiss_preassigned		faiss_preassigned
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
base_dataset.py		base_dataset.py
calc_gt.py		calc_gt.py
conf.json		conf.json
config.py		config.py
dataloader.py		dataloader.py
dataset.py		dataset.py
dataset_inspect.py		dataset_inspect.py
dataset_split_query.py		dataset_split_query.py
environment.yml		environment.yml
evaluate.py		evaluate.py
main.py		main.py
metric.py		metric.py
model.py		model.py
rwutils.py		rwutils.py
tree_anns.py		tree_anns.py
tree_model.py		tree_model.py
utils.py		utils.py
utils_faiss.py		utils_faiss.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ROLTEX

Prerequisites

Run

Evaluation

References

License

About

Uh oh!

Releases 1

Packages

Languages

License

USTCLLM/rotlex

Folders and files

Latest commit

History

Repository files navigation

ROLTEX

Prerequisites

Run

Evaluation

References

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages