Skip to content

Datamap plot for plotting; HDBSCAN for clustering #6

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

lmcinnes
Copy link

@lmcinnes lmcinnes commented Mar 7, 2024

The DataMapPlot library provides both static plots and interactive plots (backed by matplotlib + datashader and deck.gl respectively) it allows for rich plotting very easily, and takes care of many of the details handled here, as well as several other aspects (palette handling, label placement, interactive search, titles, etc.).

Using HDBSCAN instead of DBSCAN for clustering allows for a single relatively intuitive clustering parameter (min_cluster_size) while still producing good clusterings.

Still cleaning up a few things, but I wanted to open the PR for discussion purposes. Would this be of interest?

@lvwerra
Copy link
Member

lvwerra commented Mar 7, 2024

Awesome, yes very open to improvements! Happy to replace DBSCAN with HDSCAN. For the plotting maybe we can just have three backends?Something like a show(PLOT_LIB, **lib_kwargs) method interface where the PLOT_LIB is any of ["mpl", "plotly", "dmp"]?

At the moment the repo is mostly intended as a template people can copy and modify rather than a full ibrary, so the different plotting methods would also serve to show how to customize the plots.

What do you think?

@lmcinnes
Copy link
Author

lmcinnes commented Mar 7, 2024

That's a reasonable option. I'll see if I can rework this to work that way.

@lmcinnes lmcinnes changed the title [WIP] Datamap plot for plotting; HDBSCAN for clustering Datamap plot for plotting; HDBSCAN for clustering Mar 28, 2024
@lmcinnes
Copy link
Author

Sorry for the delay, I had to step away from this for a little bit. I've moved things around so we now support multiple backends. Potentially we could set a default, but forcing the user to decide is also a reasonable option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants