Skip to content

Update build_tree function with SparseKmeans implementation #19

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 25, 2025

Conversation

khoinpd0411
Copy link
Contributor

@khoinpd0411 khoinpd0411 commented Jul 14, 2025

What does this PR do?

Update build_tree function with SparseKmeans implementation and utilize an adaptive clustering method mixing Elkan's and Lloyd's algorithm based on the number of samples.

Improvements:

  • Speed up tree construction.
  • Resolved convergence issues caused by duplicate samples during clustering
  • Introduced an adaptive clustering strategy that dynamically switches between Elkan’s algorithm (for large sample sizes) and Lloyd’s algorithm (for smaller or dense datasets)

Test CLI & API (bash tests/autotest.sh)

Test APIs used by main.py.

  • Test Pass
    • (Copy and paste the last outputted line here.)
  • Not Applicable (i.e., the PR does not include API changes.)

Check API Document

If any new APIs are added, please check if the description of the APIs is added to API document.

  • API document is updated (linear, nn)
  • Not Applicable (i.e., the PR does not include API changes.)

Test quickstart & API (bash tests/docs/test_changed_document.sh)

If any APIs in quickstarts or tutorials are modified, please run this test to check if the current examples can run correctly after the modified APIs are released.

…clustering method mixing Elkan's and Lloyd's algorithm based on the number of samples
@khoinpd0411 khoinpd0411 requested review from cjlin1 and a team as code owners July 14, 2025 20:13
@Eleven1Liu Eleven1Liu added model/linear release PyPI release tag is in this PR labels Jul 14, 2025
@Eleven1Liu Eleven1Liu removed the release PyPI release tag is in this PR label Jul 17, 2025
- Eliminate duplicated declarations of KMeans parameters for Lloyd’s and Elkan’s methods.
- Move the check for the number of unique labels outside the loop.
- Combine terminal conditions where d >= dmax and the number of unique samples is less than K.
Copy link
Contributor

@Eleven1Liu Eleven1Liu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verified by @maclin726

@Eleven1Liu Eleven1Liu merged commit 0e5c4f6 into ntumlgroup:master Jul 25, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants