-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Created samples for 'FeaturizeText' API. #3120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
// as well as the source of randomness. | ||
var mlContext = new MLContext(); | ||
|
||
// Get a small dataset as an IEnumerable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Get [](start = 15, length = 3)
Create / Define #Resolved
// Get a small dataset as an IEnumerable. | ||
var samples = new List<TextData>() | ||
{ | ||
new TextData(){ Text ="ML.NET's FeaturizeText API uses a composition of several basic transforms to convert text into numeric features." }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra space? = " #Resolved
Codecov Report
@@ Coverage Diff @@
## master #3120 +/- ##
==========================================
- Coverage 72.52% 72.51% -0.01%
==========================================
Files 808 808
Lines 144665 144665
Branches 16198 16198
==========================================
- Hits 104913 104910 -3
- Misses 35342 35344 +2
- Partials 4410 4411 +1
|
Codecov Report
@@ Coverage Diff @@
## master #3120 +/- ##
=======================================
Coverage 72.52% 72.52%
=======================================
Files 808 808
Lines 144665 144665
Branches 16198 16198
=======================================
Hits 104913 104913
+ Misses 35342 35341 -1
- Partials 4410 4411 +1
|
// Use ML.NET's built-in stop word remover | ||
StopWordsRemoverOptions = new StopWordsRemovingEstimator.Options() { Language = TextFeaturizingEstimator.Language.English }, | ||
WordFeatureExtractor = new WordBagEstimator.Options() { NgramLength = 1 }, | ||
CharFeatureExtractor = new WordBagEstimator.Options() { NgramLength = 1 }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 [](start = 86, length = 1)
is this single char tokenization? it would be just the alphabets. is it ever useful? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, can be useful in some cases. but lets change it to 3-gram which is more useful.
In reply to: 269776047 [](ancestors = 269776047)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! |
Related to #1209.