Skip to content

API reference - Samples for Transforms #1209

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sfilipi opened this issue Oct 10, 2018 · 6 comments
Closed

API reference - Samples for Transforms #1209

sfilipi opened this issue Oct 10, 2018 · 6 comments
Assignees
Labels
documentation Related to documentation of ML.NET

Comments

@sfilipi
Copy link
Member

sfilipi commented Oct 10, 2018

We need to add samples on how to use the new transformer, and estimators than reference those samples from the XML documentation so that in docs.microsoft.com users can copy/paste the sample and have a head-starts.

Mot of the tests that got added as part of the transformer work are a good start for creating a sample.

MLContext Catalogs

Catalog Total APIs Samples Owner Samples Status / ETA
MLContext.Transforms (root) 19 Senja Remaining: 4 overrides for the normalizer multicolumn examples
MLContext.Transforms.Categorical 2 ZeeshanA Done v1
MLContext.Transforms.Conversion 6 Senja DoneV1
MLContext.Transforms.FeatureSelection 4 ZeeshanA Done v1
MLContext.Transforms.TimeSeries 4 Senja Done V1
MLContext.Transforms.Text 29 ZeeshanA Done V1
MLContext.Data 10 Senja DoneV1
MLContext.Model (root) 4 ZeeshanS  DoneV1  

P0+P1 Public API (extension methods) per Catalog

MLContext.Transforms (root) Num Overloads Documentation Sample API Owner
CopyColumns 2 Yes 2 Can remove dependency on DatasetUtils. Zeeshan
Concatenate 1 Yes, needs improvement. 1 - Can remove dependency on DatasetUtils. Zeeshan
DropColumns 1 Yes 1 Can remove dependency on DatasetUtils. Zeeshan
SelectColumns 2 Yes, needs improvement. 2 - Can remove dependency on DatasetUtils. Zeeshan
Normalize 1 Done. 1 #3244 Ivan
CustomMapping 1 Yes, needs improvement. Done-v1 #3275 Artidoro
IndicateMissingValues 2 Done-v1 #3275 Artidoro
ReplaceMissingValues 2 Done-v1 #3275 Artidoro
ConvertToGrayscale 1 Yes, needs fixes. Example not displaying. 1 #3165 Abhishek
LoadImages 1 Yes, needs fixes. Example not displaying. 1 #3165 Abhishek
ExtractPixels 2 Yes, needs fixes. Example not displaying. 1 #3165 Abhishek
ResizeImages 2 Yes. Example not displaying. 1 #3165 Abhishek
ConvertToImage 2 Yes. 1 #3165 Abhishek
IidChangePointEstimator 1 1- Done Senja
IidSpikeEstimator 1 1 - Done Senja
SsaChangePointEstimator 1 1 - Done Senja
SsaSpikeEstimator 1 1 - Done Senja
ApplyOnnxModel 3 DoneV1 #3349 Gani
DnnFeaturizeImage 1 Yes, needs improvement. 1 - Done Senja
NormalizeGlobalContrast 1 Done 0 #3232 Ivan
NormalizeLpNorm 1 Done. 0 #3232 Ivan
ApproximatedKernelMap 1 Done 0 #3232 Ivan
mlContext.Transforms. CalculateFeatureContribution 1 Yes, needs improvement Rogan
MLContext.Transforms.Categorical Num Overloads Documentation Sample API Owner
OneHotEncoding 2 2 #3179 Abhishek
OneHotHashEncoding 2 2 #3179 Abhishek
MLContext.Transforms.Conversion Num Overloads Documentation Sample API Owner
Hash 2 can't find the API Done Senja
ConvertType 2 Yes, needs improvement. Done Senja
MapKeyToValue 2 Yes, needs improvement. Done Senja
MapKeyToVector 2 Yes, needs improvement. Done Senja
MapValueToKey 2 Yes. Done Senja
MapKeyToBinaryVector 2 Yes, needs improvement. Done Senja
MLContext.Transforms.FeatureSelection Num Overloads Documentation Sample API Owner
SelectFeaturesBasedOnMutualInformation 2 need a better example to show MI computation. something like this 2 #3184 Abhishek
SelectFeaturesBasedOnCount 2 2 #3184 Abhishek
MLContext.Transforms.Text Num Overloads Documentation Sample API Owner
FeaturizeText 2 #3120 Zeeshan
TokenizeCharacters 1 #3123 Zeeshan
NormalizeText 1 #3133 Zeeshan
ExtractWordEmbeddings 1 #3142 Zeeshan
TokenizeWords 1 #3156 Zeeshan
ProduceNgrams 3 #3177 Zeeshan
RemoveDefaultStopWords 2 #3156 Zeeshan
RemoveStopWords 2 #3156 Zeeshan
ProduceWordBags 3 #3183 Zeeshan
ProduceHashedWordBags 3 #3183 Zeeshan
ProduceHashedNgrams 3 #3177 Zeeshan
LatentDirichletAllocation 2 #3191 Zeeshan

For the Data catalog, all API's documentations needs to be augmented with suggestions for when would one use this API.

MLContext.Data Num Overloads Documentation Sample API Owner
LoadFromEnumerable 1 Done. 1 - Done. Senja
CreateEnumerable 2 Done. The second overload of this API is a P4 scenario. the use case for that API would be: users has a model which has slot names preserved for the features, and when they load the models, they also get the schema out of the loaded model and pass that schema, together with the TRow type they want to load the data to this API. This API will then populate the Annotations (former metadata) for the feature column. 1 Senja
BootstrapSample 1 Done. 1 - Done. Senja
Cache 1 Done. 1 - Done. Senja
FilterRowsByColumn 1 Done. 1 - Done. Senja
FilterRowsByKeyColumnFraction 1 Done. 1 - Done. Senja
FilterRowsByMissingValues 1 Done. 1 - Done. Senja
ShuffleRows 1 Done. 1 - Done. Senja
SkipRows 1 Done. 1 - Done. Senja
TakeRows 1 Done. 1 - Done. Senja
Other Num Overloads Documentation Sample API Owner
Permutation Feature Importance 4 Yes, but needs work Yes, but needs work Rogan
@sfilipi sfilipi self-assigned this Oct 10, 2018
@sfilipi sfilipi added the documentation Related to documentation of ML.NET label Oct 10, 2018
@sfilipi sfilipi removed their assignment Oct 10, 2018
@sfilipi sfilipi self-assigned this Oct 17, 2018
@sfilipi sfilipi changed the title Documentation samples for the transforms of the new API Documentation samples for the componentsof the new API Oct 24, 2018
@shauheen shauheen changed the title Documentation samples for the componentsof the new API Documentation samples for the components of the new API Nov 27, 2018
@sfilipi sfilipi changed the title Documentation samples for the components of the new API Documentation and samples for the API reference site Jan 29, 2019
@sfilipi

This comment has been minimized.

@jwood803
Copy link
Contributor

jwood803 commented Feb 8, 2019

Would the logistic regression one be done with PR #2256? I wonder if others may be done, too. I can try to go through and see if they have XML doc examples.

@sfilipi
Copy link
Member Author

sfilipi commented Feb 8, 2019

Hi @jwood803, yes you took care of LogisticRegression with #2256. Thanks!
This workitem is to complete everything: XML doc over the extensions, estimators etc.

If you have bandwidth, and are looking for something to do, you can contribute to the samples, and replicate the work you did on Logistic Regression for #2256 for the other binary trainers:

LightGBM,
FastTree,
AveragedPerceptron,
SDCA,
LinearSVM
SymSGD

basically every extension on the BinaryClassificationCatalog.BinaryClassificationTrainers.

this BinaryClassificationCatalog.BinaryClassificationTrainers catalog,

cc @shmoradims @rogancarr FYI.

If you claim those, feel free to update the table with your username.

@shmoradims
Copy link

I moved the trainers to a separate issue: #2522

@shmoradims shmoradims changed the title Docs and samples for the API reference site (P0 & P1 API) Docs and samples for the API reference site (P0 & P1 Transforms) Mar 29, 2019
@rogancarr rogancarr removed their assignment Apr 4, 2019
@TomFinley
Copy link
Contributor

I talked with @shmoradims and @eerhardt and I will work as a sub-part of this issue an example implementation of IDataView.

@shmoradims shmoradims changed the title Docs and samples for the API reference site (P0 & P1 Transforms) API reference - Samples for Transforms Apr 12, 2019
This was referenced Apr 16, 2019
@sfilipi
Copy link
Member Author

sfilipi commented Apr 19, 2019

Verified that everything is documented, but the normalizer multicolumn APIs. Tracking that as a separate issue.

@sfilipi sfilipi closed this as completed Apr 19, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Mar 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
documentation Related to documentation of ML.NET
Projects
None yet
Development

No branches or pull requests

6 participants