You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If you search for NgramExtract in the solution, the following three main classes pop up.
NgramExtractorTransform (in WordBagTransform.cs)
NgramExtractingTransformer (in NgramTransform.cs)
NgramExtractingEstimator (in NgramTrasnform.cs)
2 and 3 seem to be the actual classes where ngram extraction logic is written. However, 1 uses 2 and 3 with a pre-processing step where if input is text it is first converted to terms using ValueToKeyMappingTransformer.
First, NgramExtractorTransform does not seem to be in correct file i.e filename and class name do not match.
Second, the NgramExtractorTransform is not doing ngram extraction instead composing two different estimators (NgramExtractingEstimator and ValueToKeyMappingEstimator).
I think NgramExtractorTransform be renamed to WordBagTransform or something appropirate.
NgramExtractorTransform is internal class which used to be WordBagTransform.
In ideal world we would had only transformers in our code, but we didn't had time to properly convert WordBag and WordHashBag.
So in order to work with our new API's we created TransformerWrapper class and derive from it NgramExtractingTransformer plus we create estimator as well NgramExtractingEstimator
By themselves WordBagTransformer and WordHashBagTransformer is complex pairing of estimators together into chain. But that logic is quite complicated, and I never had courage to sit and untangle it properly. At some point we would need to tackle that. But I doubt we have time for that.
So for your points:
First - Yes, it's a bad name, but it's a internal class, not a big priority right now.
Second - Yes, so do OneHotEncoding, it's just term transform (KeyToValue) + KeyToVector (or KeyToBinaryVector) or FeaturizeText which is just set of estimators together.
Should it be renamed? Yes. Should it be renamed right now? No. I would honestly prefer to finish it conversion to IEstimator and delete it.
If you search for
NgramExtract
in the solution, the following three main classes pop up.2
and3
seem to be the actual classes where ngram extraction logic is written. However,1
uses2
and3
with a pre-processing step where if input is text it is first converted to terms using ValueToKeyMappingTransformer.First,
NgramExtractorTransform
does not seem to be in correct file i.e filename and class name do not match.Second, the
NgramExtractorTransform
is not doing ngram extraction instead composing two different estimators (NgramExtractingEstimator and ValueToKeyMappingEstimator).I think
NgramExtractorTransform
be renamed toWordBagTransform
or something appropirate.CC: @Ivanidzo4ka, @TomFinley, @sfilipi, @rogancarr.
The text was updated successfully, but these errors were encountered: