Skip to content
View Pringled's full-sized avatar
🚢
🚢

Organizations

@MinishLab

Block or report Pringled

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Pringled/README.md

Hi there 👋

I'm Thomas van Dongen. I am currently working as head of AI engineering at Springer Nature. I am one of the founding members (and current maintainer) of Minish, developing open-source machine learning packages.

I'm currently working on:

  • model2vec: a library for creating state-of-the-art static embedding models by distilling sentence transformers.
  • semhash: a library for lightweight text deduplication, outlier detection, and representative filtering.
  • vicinity: a library for fast and lightweight nearest neighbor search, with flexible indexing backends.
  • tokenlearn: a library for pre-training static embedding models.
  • model2vec-rs: a Rust port of Model2Vec.
  • pyversity: a library for retrieval result diversification.

My research interests include:

  • Small, fast models: I like making eco-friendly models that do not need expensive hardware.
  • Word embeddings: Specifically focusing on static word embeddings (yes, I believe they are still relevant!)
  • Information retrieval and recommenders: In my day-to-day job I focus on developing recommenders and information retrieval pipelines for the scientific domain.

Info:

Pinned Loading

  1. MinishLab/model2vec MinishLab/model2vec Public

    Fast State-of-the-Art Static Embeddings

    Python 1.9k 101

  2. MinishLab/semhash MinishLab/semhash Public

    Fast Semantic Text Deduplication & Filtering

    Python 814 50

  3. MinishLab/vicinity MinishLab/vicinity Public

    Lightweight Nearest Neighbors with Flexible Backends

    Python 311 10

  4. MinishLab/tokenlearn MinishLab/tokenlearn Public

    Pre-train Static Word Embeddings

    Python 87 8

  5. MinishLab/model2vec-rs MinishLab/model2vec-rs Public

    Official Rust Implementation of Model2Vec

    Rust 138 12

  6. pyversity pyversity Public

    Fast Diversification for Search & Retrieval

    Python 95 3