You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@justinormont and the text team tuned default n-gram lengths for the default text recipe in the internal repo
These defaults are:
Word -- bigrams (w/ unigrams)
Character -- trigrams (w/o unigrams and bigrams)
One chart from his findings:
The line w/ the light blue call-out represents current ML.NET defaults (Unigram + Trichar)
The line w/ the light green call-out is the requested change (Bigram + Trichar)
The line w/ the pink call-out shows the Trigram+Trichar is better in terms of accuracy, but with a time hit, and accuracy has a cross over at NumIterations > 8 for Averaged Perceptron learner.
The text was updated successfully, but these errors were encountered:
daholste
changed the title
Update default n-gram length in text transforms to match TLC
Update default n-gram length in Text Transform to match TLC
Mar 6, 2019
daholste
changed the title
Update default n-gram length in Text Transform to match TLC
Update default n-gram length for Text Transform to match TLC
Mar 6, 2019
daholste
changed the title
Update default n-gram length for Text Transform to match TLC
Update default n-gram length for Text Transform to match default text recipe
Mar 6, 2019
That's up to @shauheen. I'd say yes, as there's strong upsides of accuracy. You'll notice the large jump in accuracy (y-axis) when we move from the blue to green lines in the above graph.
The power of defaults should never be underestimated.
@justinormont and the text team tuned default n-gram lengths for the default text recipe in the internal repo
These defaults are:
Word -- bigrams (w/ unigrams)
Character -- trigrams (w/o unigrams and bigrams)
One chart from his findings:

The line w/ the light blue call-out represents current ML.NET defaults (Unigram + Trichar)
The line w/ the light green call-out is the requested change (Bigram + Trichar)
The line w/ the pink call-out shows the Trigram+Trichar is better in terms of accuracy, but with a time hit, and accuracy has a cross over at NumIterations > 8 for Averaged Perceptron learner.
The text was updated successfully, but these errors were encountered: