Skip to content

API reference - XML documentation template for transforms #3204

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sfilipi opened this issue Apr 4, 2019 · 11 comments
Closed

API reference - XML documentation template for transforms #3204

sfilipi opened this issue Apr 4, 2019 · 11 comments
Assignees
Labels
documentation Related to documentation of ML.NET

Comments

@sfilipi
Copy link
Member

sfilipi commented Apr 4, 2019

The XML documentation for the transforms should contain information about the schema: requirements about the type of the columns to work on, and information about the type of the columns produced.

@shmoradims

@sfilipi sfilipi added the documentation Related to documentation of ML.NET label Apr 4, 2019
@wschin
Copy link
Member

wschin commented Apr 4, 2019

I'd suggest to document GetOutputSchema function and transformer's XML can reference GetOutputSchema's document.

@singlis
Copy link
Member

singlis commented Apr 4, 2019

Referencing #3127 as this feels same or related, maybe we could track this work under one issue? I think there are going to be a number of subsections that will need to be tracked. And should this go on the transforms? Or the extensions?

@sfilipi
Copy link
Member Author

sfilipi commented Apr 5, 2019

I'd suggest to document GetOutputSchema function and transformer's XML can reference GetOutputSchema's document.

It won't work, because GetOutputSchema is on the base class, and is calling GetOutputSchemaCore of every class, which is internal.
Also the types of the columns for the input columns need to be documented.

@sfilipi sfilipi changed the title The XML documentation for the transforms should contain information about schema XML documentation template for transforms Apr 10, 2019
@sfilipi
Copy link
Member Author

sfilipi commented Apr 10, 2019

Proposal for the transforms XML template, mirroring the trainers template from issue: #3218,

1- XML on Transform extension method:

  • Summary - Create an <see cref the estimator>estimator that <Short transform description>

  • Parameters:
    inputColumnName: Name of column to transform. If set to <see langword="null"/>, the value of the <paramref name="outputColumnName"/> will be used as source.
    The data type on this column should be <ref required data type.> | The data type on this column can be any type.
    outputColumnName: Name of the column resulting from the transformation of <paramref name="inputColumnName"/>.
    This column's data type will be <info from GetOutputSchema here>.

  • Example

2 - XML of the Estimator, Transformer

  • Summary - One line description of what the transform does.
  • Remarks - More details about the transform and its implementation.
    Does it do a pass through the data? Yes/No
    Input column type
    Output column type.
    Additional NuGet: "Link to NuGet" OR None of all that are included already in Microsoft.ML
  • SeeAlso cref the extension method for an example usage.

cc @natke @shmoradims @glebuk @singlis

@natke
Copy link
Contributor

natke commented Apr 10, 2019

Looks good @sfilipi. One small comment: for the summary of the extension method, would we want to say: "Create an xxx transform ..."?

@artidoro
Copy link
Contributor

Shouldn't it say "Create an xxx estimator" instead of "transform"?
@sfilipi

@sfilipi
Copy link
Member Author

sfilipi commented Apr 10, 2019

Shouldn't it say "Create an xxx estimator" instead of "transform"?
@sfilipi

@natke suggested that we reference the components by their functionality: trainers and transforms.
In that specific case we'll keep it to estimator - i updated the template above to reflect that.

in the Remarks of the actual estimator we're keeping transform, as a term.

@shmoradims
Copy link

shmoradims commented Apr 10, 2019

  • Do we need the remarks? Can the summary just say: "Create an <cref=the estimator>"?
  • Let's add a short description of the transform in the summary, similar to trainers.
  • Onnx -> ONNX

@sfilipi
Copy link
Member Author

sfilipi commented Apr 11, 2019

Transform priority is in this old issue. Use it as guide to prioritize more popular transforms. Also, reuse descriptions from here.

List of transforms to sign up for:

MLContext.Transforms (root) API Owner Priority Status
CopyColumns Senja   Done PrToV1: #3348 
Concatenate  Artidoro Done  
DropColumns Artidoro   3 Done
SelectColumns Artidoro  Done
Normalize.MinMax Scott    Done #3432  
Normalize.MeanVariance Scott    Done  #3432
Normalize.LogMeanVariance Scott    Done  #3432
Normalize.Binning Scott    Done #3432  
Normalize.SupervisedBinning Scott    Done #3432  
CustomMapping Artidoro     Done 
IndicateMissingValues Ivan    Done #3386  
ReplaceMissingValues Ivan    Done #3386 
ConvertToGrayscale Ivan     In PR #3376 
LoadImages Ivan    In PR #3376 
ExtractPixels Ivan    In PR #3376 
ResizeImages Ivan    In PR #3376 
ConvertToImage Ivan    In PR #3376 
IidChangePointEstimator Wei-Sheng     Done #3444 
IidSpikeEstimator Wei-Sheng   Done #3444 
SsaChangePointEstimator Wei-Sheng   Done #3444 
SsaSpikeEstimator Wei-Sheng    Done #3444 
ApplyOnnxModel Gani    Done  #3387
DnnFeaturizeImage Senja     
NormalizeGlobalContrast Artidoro     Done
NormalizeLpNorm Artidoro     Done
ApproximatedKernelMap Yael    Done #3377
CalculateFeatureContribution Yael    Done #3384  

@sfilipi
Copy link
Member Author

sfilipi commented Apr 11, 2019

Other catalogs:

MLContext.Transforms.Categorical API Owner Priority Status
OneHotEncoding Najeeb 0 Done #3388
OneHotHashEncoding Najeeb 1 Done #3388
MLContext.Transforms.Conversion API Owner Priority Status
Hash Senja Done
ConvertType Senja Done
MapKeyToValue Senja Done
MapKeyToVector Senja Done
MapValueToKey Senja Done
MapKeyToBinaryVector Senja Done
MLContext.Transforms.FeatureSelection API Owner Priority Status
SelectFeaturesBasedOnMutualInformation Senja need a better example to show MI computation. something like this Done
SelectFeaturesBasedOnCount Senja Done
MLContext.Transforms.Text API Owner Priority Status
FeaturizeText Senja Done #3438
TokenizeCharacters Artidoro 2 Done #3418
NormalizeText Artidoro 2 Done#3418
ExtractWordEmbeddings Artidoro Done #3418
TokenizeWords Artidoro 2 Done #3418
ProduceNgrams Artidoro 2 Done #3418
RemoveDefaultStopWords Ivan Done #3413
RemoveStopWords Ivan Done #3413
ProduceWordBags Ivan Done #3440
ProduceHashedWordBags Ivan Done #3440
ProduceHashedNgrams Ivan Done #3419
LatentDirichletAllocation Senja Done #3442

For the Data catalog, all API's documentations needs to be augmented with suggestions for when would one use this API.

MLContext.Data API Owner Priority Status
LoadFromEnumerable Najeeb Done #3417
CreateEnumerable Najeeb The second overload of this API is a P4 scenario. the use case for that API would be: users has a model which has slot names preserved for the features, and when they load the models, they also get the schema out of the loaded model and pass that schema, together with the TRow type they want to load the data to this API. This API will then populate the Annotations (former metadata) for the feature column. Done #3417
BootstrapSample Najeeb Done (previously by Rogan)
Cache Najeeb Done (previously by Rogan)
FilterRowsByColumn Najeeb Done (previously by Rogan)
FilterRowsByKeyColumnFraction Najeeb Done (previously by Rogan)
FilterRowsByMissingValues Wei-Sheng Done (previously by Rogan)
ShuffleRows Wei-Sheng Done (previously by Rogan)
SkipRows Wei-Sheng #3415 Done (previously by Rogan)
TakeRows Wei-Sheng #3415 Done (previously by Rogan)
Other API Owner Priority Status
Permutation Feature Importance Shahab Doen by @codemzs
MLContext.Model (root) Shahab #3451

@sfilipi sfilipi changed the title XML documentation template for transforms API reference - XML documentation template for transforms Apr 12, 2019
sfilipi added a commit to sfilipi/machinelearning-1 that referenced this issue Apr 12, 2019
sfilipi added a commit that referenced this issue Apr 15, 2019
* applying the #3204 template to ColumnCopying.
sfilipi added a commit to sfilipi/machinelearning-1 that referenced this issue Apr 16, 2019
yaeldekel pushed a commit that referenced this issue Apr 18, 2019
…3377)

* Documentation for ApproximatedKernelMappingEstimator

* Address code review comments

* Address Shahab's comments
singlis added a commit to singlis/machinelearning that referenced this issue Apr 19, 2019
@sfilipi sfilipi self-assigned this Apr 19, 2019
sfilipi added a commit that referenced this issue Apr 19, 2019
* FeatureSelection extensions documentation
yaeldekel pushed a commit that referenced this issue Apr 20, 2019
…ator (#3384)

* Documentation for FeatureContributionEstimator

* Address code review comments

* Address code review comments
sfilipi added a commit that referenced this issue Apr 20, 2019
* towards adapting the Conversions catalog documentation to the new template.
singlis added a commit that referenced this issue Apr 20, 2019
* XML documentation for Normalizer
Tracked in #3204
najeeb-kazmi added a commit that referenced this issue Apr 21, 2019
…3388)

* Docs for Categorical catalog

* Fixing EntryPoints test

* PR comments

* PR comments - output data types

* categorical update

* Final review comments
@shmoradims
Copy link

Finished 4/21 9:29pm. Great team work.

@ghost ghost locked as resolved and limited conversation to collaborators Mar 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
documentation Related to documentation of ML.NET
Projects
None yet
Development

No branches or pull requests

6 participants