Skip to content

fix: digits pre-tokenizer returning empty array for text with no digits #51

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 23, 2024

Conversation

CodeWithKyrian
Copy link
Owner

  • Bug Fix
  • New Feature

Description:

This PR addresses a regex issue that caused the DigitsPretokenizer to behave unpredictably, returning an empty array for all inputs—whether they contained digits, lacked digits, or consisted solely of digits. The intended behavior is for the DigitsPretokenizer to return text without digits as-is and split digits into individual parts. This fix ensures the DigitsPretokenizer now functions as expected.

@CodeWithKyrian CodeWithKyrian merged commit 69f3d32 into main Jul 23, 2024
@CodeWithKyrian CodeWithKyrian deleted the dev-digits-pretokenizer branch July 23, 2024 21:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant