Skip to content

Discrepancy in NgramExtractorTrasform, NgramExtractingTransformer and NgramExtractingEstimator. #2895

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
zeahmed opened this issue Mar 9, 2019 · 2 comments
Labels
API Issues pertaining the friendly API P2 Priority of the issue for triage purpose: Needs to be fixed at some point.

Comments

@zeahmed
Copy link
Contributor

zeahmed commented Mar 9, 2019

If you search for NgramExtract in the solution, the following three main classes pop up.

  1. NgramExtractorTransform (in WordBagTransform.cs)
  2. NgramExtractingTransformer (in NgramTransform.cs)
  3. NgramExtractingEstimator (in NgramTrasnform.cs)

2 and 3 seem to be the actual classes where ngram extraction logic is written. However, 1 uses 2 and 3 with a pre-processing step where if input is text it is first converted to terms using ValueToKeyMappingTransformer.

First, NgramExtractorTransform does not seem to be in correct file i.e filename and class name do not match.
Second, the NgramExtractorTransform is not doing ngram extraction instead composing two different estimators (NgramExtractingEstimator and ValueToKeyMappingEstimator).

I think NgramExtractorTransform be renamed to WordBagTransform or something appropirate.

CC: @Ivanidzo4ka, @TomFinley, @sfilipi, @rogancarr.

@Ivanidzo4ka
Copy link
Contributor

NgramExtractorTransform is internal class which used to be WordBagTransform.
In ideal world we would had only transformers in our code, but we didn't had time to properly convert WordBag and WordHashBag.

So in order to work with our new API's we created TransformerWrapper class and derive from it NgramExtractingTransformer plus we create estimator as well NgramExtractingEstimator

By themselves WordBagTransformer and WordHashBagTransformer is complex pairing of estimators together into chain. But that logic is quite complicated, and I never had courage to sit and untangle it properly. At some point we would need to tackle that. But I doubt we have time for that.

So for your points:
First - Yes, it's a bad name, but it's a internal class, not a big priority right now.
Second - Yes, so do OneHotEncoding, it's just term transform (KeyToValue) + KeyToVector (or KeyToBinaryVector) or FeaturizeText which is just set of estimators together.

Should it be renamed? Yes. Should it be renamed right now? No. I would honestly prefer to finish it conversion to IEstimator and delete it.

@singlis singlis added the API Issues pertaining the friendly API label Mar 18, 2019
@zeahmed
Copy link
Contributor Author

zeahmed commented Mar 21, 2019

Also,

private WordBagEstimator.Options _charFeatureExtractor;

does not sound reasonable. Maybe we need to create an option class in `NgramExtractingEstimator" and use it here as well for

private WordBagEstimator.Options _wordFeatureExtractor;
to be consistent.

@harishsk harishsk added the P2 Priority of the issue for triage purpose: Needs to be fixed at some point. label Jan 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Issues pertaining the friendly API P2 Priority of the issue for triage purpose: Needs to be fixed at some point.
Projects
None yet
Development

No branches or pull requests

4 participants