Skip to content

New API for ML.NET #754

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Zruty0 opened this issue Aug 27, 2018 · 4 comments
Closed

New API for ML.NET #754

Zruty0 opened this issue Aug 27, 2018 · 4 comments
Labels
API Issues pertaining the friendly API

Comments

@Zruty0
Copy link
Contributor

Zruty0 commented Aug 27, 2018

We are creating a stable API that:

  • Uses parallel terminology with other well-known ML libraries (Spark, sklearn);
  • Takes advantage of strong types of .NET to shorten path to success;
  • Is going to be present from now till 1.0 and beyond;
  • Keeps simple ML scenarios concise;
  • Allows advanced ML scenarios: see Direct API: Scenarios to light up for V1 #584 .

To that end, we are going to expose a selection of Estimators and Transformers (see #581 ) that cover existing transforms, learners and loaders.

This issue will be used to track the overall project status: what is planned to be done, what is done, etc.

@Zruty0 Zruty0 added the API Issues pertaining the friendly API label Aug 27, 2018
@Zruty0
Copy link
Contributor Author

Zruty0 commented Aug 27, 2018

List of transforms:

Transform Category Priority Has entry point Status
Key To Value Transform Categorical 0 1 #856
Concat Transform Schema manipulation 0 1 #896
Term Transform Categorical 0 1 #759
Text Transform Text processing 0 1 #801
Image Greyscale Transform Image 0 0 #753
Image Loader Transform Image 0 0 #753
Image Pixel Extractor Transform Image 0 0 #753
Image Resizer Transform Image 0 0 #753
Tensorflow Scoring Transform Image 0 1 #840 #877
Min-Max Normalizer Normalizer 0 1 #797
NA Handle Transform Missing values 1 1 no need
NA Indicator Transform Missing values 1 1 #1217
NA Replace Transform Missing values 1 1 #917
Categorical Transform Categorical 1 1 #899
Categorical Hash Transform Categorical 1 1 #1033
Hash Transform Categorical 1 0 #944
Copy Columns Transform Schema manipulation 1 1 #706
Word Embeddings Transform Text processing 1 1 #928
Key To Vector Transform Categorical 1 0 #858
Character Tokenizer Transform Text processing 2 1 #931
Ngram Hash Transform Text processing 2 0 #953
Ngram Transform Text processing 2 1 #953
Stopwords Remover Transform Text processing 2 0 #953
Text Normalizer Transform Text processing 2 0 #953
Word Bag Transform Text processing 2 0 #953
Word Hash Bag Transform Text processing 2 0 #953
Word Tokenizer Transform Text processing 2 1 #931
Convert Transform Column mapper 2 1
Drop Slots Transform Feature selection 2 0 no need
LogMeanVar Normalizer Normalizer 2 1 #797
MeanVar Normalizer Normalizer 2 1 #797
Latent Dirichlet Allocation Transform Projection 2 1 #972
Tree Ensemble Featurization Transform Projection 2 1
NA Filter Row manipulation 2 1
Count Feature Selection Transform Feature selection 2 1 #991
Mutual Information Feature Selection Transform Feature selection 2 1 #991
NA Drop Transform Missing values 2 1
Binning Normalizer Normalizer 2 1 #797
Global Contrast Normalization Transform Normalizer 2 1 #961
Principal Component Analysis Transform Projection 2 1 #1333
Random Fourier Features Transform Projection 2 0 #1122
Bootstrap Sample Transform Row manipulation 2 1
Shuffle Transform Row manipulation 2 0
Custom Stopwords Remover Transform Text processing 2 0
Learner Feature Selection Transform Feature selection 3 0
Lp-Norm Normalizer Normalizer 3 1 #961
Supervised Binning Normalizer Normalizer 3 1
Whitening Transform Normalizer 3 0 #961, #1326
Choose Columns Transform Schema manipulation 3 0
Drop Columns Transform Schema manipulation 3 1
Generate Number Transform Schema manipulation 3 1
Keep Columns Transform Schema manipulation 3 0
Key To Binary Vector Transform Categorical 3 0 #858
Term Lookup Transform Categorical 3 0
Label Indicator Transform Column mapper 3 1
Group Transform Relational operation 3 1
Un-group Transform Relational operation 3 1
Range Filter Row manipulation 3 1
Skip Filter Row manipulation 3 1
Skip and Take Filter Row manipulation 3 1
Take Filter Row manipulation 3 1
Evaluate Predictor Re-evaluate 4 0
Hash Join Transform Re-evaluate 4 1
Load Transform Re-evaluate 4 0
Optional Column Transform Re-evaluate 4 1
Score Predictor Re-evaluate 4 0
Sentiment Analyzing Transform Re-evaluate 4 1
Train and Score Predictor Re-evaluate 4 0

@Zruty0
Copy link
Contributor Author

Zruty0 commented Aug 27, 2018

List of trainers:

Trainer Category Priority Status
SDCA: Fast Linear (SA-SDCA) Linear 0 #716
SDCAMC: Fast Linear Multi-class Classification (SA-SDCA) Linear 0 #716
SDCAR: Fast Linear Regression (SA-SDCA) Linear 0 #716
AveragedPerceptron: Averaged Perceptron Linear 0 #849
OVA: One-vs-All Meta 0 #865
FieldAwareFactorizationMachine: Field-aware Factorization Machine FFM 0 #912
FastTreeBinaryClassification: FastTree (Boosted Trees) Classification Tree 0 #855
FastTreeRegression: FastTree (Boosted Trees) Regression Tree 0 #855
KMeansPlusPlus: KMeans++ Clustering Clustering 0 #979
LightGBMBinary: LightGBM Binary Classifier Tree 0 #962
LightGBMMulticlass: LightGBM Multi-class Classifier Tree 0 #962
LightGBMRegression: LightGBM Regressor Tree 0 #962
LogisticRegression: Logistic Regression Linear 0 #957
MultiClassLogisticRegression: Multi-class Logistic Regression Linear 0 #957
OLSLinearRegression: Ordinary Least Squares (Regression) Linear 0 #1002
SymbolicSGD: Symbolic SGD (binary) Linear 0 #1012
FastForestClassification: Fast Forest Classification Tree 0 #855
FastForestRegression: Fast Forest Regression Tree 0 #855
BinaryClassificationGamTrainer: Generalized Additive Model for Binary Classification GAM 1 #962
RegressionGamTrainer: Generalized Additive Model for Regression GAM 1 #962
OnlineGradientDescent: Stochastic Gradient Descent (Regression) Linear 1 #849
PoissonRegression: Poisson Regression Linear 1 #957
PKPD: Pairwise coupling (PKPD) Meta 1 #865
pcaAnomaly: PCA Anomaly Detector Projection 1 #996
FastTreeTweedieRegression: FastTree (Boosted Trees) Tweedie Regression Tree 1 #855
PriorPredictor: Prior Predictor Baseline 2 #875
RandomPredictor: Random Predictor Baseline 2 #875
MultiClassNaiveBayes: Multiclass Naive Bayes Bayes 2 #1111
BinarySGD: Hogwild SGD (binary) Linear 2 #1134
LinearSVM: SVM (Pegasos-Linear) Linear 2 #849
EnsembleRegression: Regression Ensemble (bagging, stacking, etc) Meta 2 On hold until we decide how to change the ensemble learners #1152
WeightedEnsemble: Parallel Ensemble (bagging, stacking, etc) Meta 2 On hold until we decide how to change the ensemble learners #1152
WeightedEnsembleMulticlass: Multi-class Parallel Ensemble (bagging, stacking, etc) Meta 2 On hold until we decide how to change the ensemble learners #1152
FastTreeRanking: FastTree (Boosted Trees) Ranking Tree 2 #855
LightGBMRanking: LightGBM Ranking Tree 2 #962

singlis added a commit to singlis/machinelearning that referenced this issue Oct 15, 2018
This adds the SelectColumns Transform and Estimator that is replacing
the DropColumns and ChooseColumns Transforms. With this check-in, Drop
and Choose are still in the code base but will be removed. In order to
support loading older models, SelectColumns supports loading in Drop and
Choose transforms. The changes include:
- Implementation of the SelectColumnsTransform,
SelectColumnsDataTransform and SelectColumnsEstimator
- Backward compatibility with Drop and Choose columns by providing
functions on SelectColumns that will be called when loading the model.
- Entry point apis for calling select from the command line.
- Additional tests.

These changes are related to dotnet#754.
@shauheen shauheen modified the milestone: 1018 Oct 16, 2018
singlis added a commit that referenced this issue Oct 20, 2018
…1269)

This adds the SelectColumns Transform and Estimator that is replacing
the DropColumns and ChooseColumns Transforms. With this check-in, Drop
and Choose are still in the code base but will be removed. In order to
support loading older models, SelectColumns supports loading in Drop and
Choose transforms. The changes include:
* Implementation of the SelectColumnsTransform,
SelectColumnsDataTransform and SelectColumnsEstimator
* Backward compatibility with Drop and Choose columns by providing
functions on SelectColumns that will be called when loading the model.
* Entry point apis for calling select from the command line.
* Additional testing of the new functionality.

These changes are related to #754.
singlis added a commit to singlis/machinelearning that referenced this issue Oct 25, 2018
replacing them with SelectColumnsTransform. These changes include:
* Updates to SelectColumnsTransform to respect ordering when keeping
columns. For example, if the input is ABC and CB is selected, the output
will be CB.
* Updates to code that used Choose or Drop columns, replacing with
SelectColumns.
* Updates to baseline output for tests to pass
* Re-enabled the SavePipeline tests

This fixes dotnet#1342
These changes are also related to dotnet#754
singlis added a commit that referenced this issue Oct 30, 2018
…olumnsTransform (#1371)

* Removes ChooseColumnsTransform and DropColumnsTransform classes
replacing them with SelectColumnsTransform. These changes include:
* Updates to SelectColumnsTransform to respect ordering when keeping
columns. For example, if the input is ABC and CB is selected, the output
will be CB.
* Updates to code that used Choose or Drop columns, replacing with
SelectColumns.
* Updates to baseline output for tests to pass
* Re-enabled the SavePipeline tests

This fixes #1342
These changes are also related to #754
singlis added a commit to singlis/machinelearning that referenced this issue Nov 21, 2018
This will be replacing the TermLookupTransform and provide a way to
specify the mapping betweeen two values (note this is specified and not
trained). A user can specify the mapping by providing a keys list and
values list that must be equal in size. The Estimator will then generate
a 1-1 mapping based on the two lists.

The PR references dotnet#754 which covers the conversion of Transformer to use
the new Estimator API.
sfilipi pushed a commit that referenced this issue Dec 21, 2018
* Addition of the ValueMappingEstimator and ValueMappingTransform.
This will be replacing the TermLookupTransform and provide a way to
specify the mapping betweeen two values (note this is specified and not
trained). A user can specify the mapping by providing a keys list and
values list that must be equal in size. The Estimator will then generate
a 1-1 mapping based on the two lists.

The PR references #754 which covers the conversion of Transformer to use
the new Estimator API.
@shauheen
Copy link
Contributor

shauheen commented Jan 8, 2019

@Ivanidzo4ka can you please verify if this can be closed now?

singlis added a commit to singlis/machinelearning that referenced this issue Jan 24, 2019
This provides an example that demonstrates different ways to use the
ValueMappingEstimator. This is part of the original change to add the
ValueMappingEstimator to the code base and references dotnet#754.
singlis added a commit that referenced this issue Feb 1, 2019
ValueMappingEstimator example (References #754)
Provides examples that demonstrate different ways to use the
ValueMappingEstimator.
* Added sample links to ValueMap catalog extensions
* Added additional documentation to the ValueMappingEstimator, including remarks section.
@eerhardt
Copy link
Member

I believe this can be closed. We have made this new API and have separate issues tracking remaining necessary API work.

Please re-open if I'm wrong.

@ghost ghost locked as resolved and limited conversation to collaborators Mar 29, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
API Issues pertaining the friendly API
Projects
None yet
Development

No branches or pull requests

3 participants