I'm Thomas van Dongen. I am currently working as head of AI engineering at Springer Nature. I am one of the founding members (and current maintainer) of Minish, developing open-source machine learning packages.
- model2vec: a library for creating state-of-the-art static embedding models by distilling sentence transformers.
- semhash: a library for lightweight text deduplication, outlier detection, and representative filtering.
- vicinity: a library for fast and lightweight nearest neighbor search, with flexible indexing backends.
- tokenlearn: a library for pre-training static embedding models.
- model2vec-rs: a Rust port of Model2Vec.
- pyversity: a library for retrieval result diversification.
- Small, fast models: I like making eco-friendly models that do not need expensive hardware.
- Word embeddings: Specifically focusing on static word embeddings (yes, I believe they are still relevant!)
- Information retrieval and recommenders: In my day-to-day job I focus on developing recommenders and information retrieval pipelines for the scientific domain.