API reference - Samples for Transforms

We need to add samples on how to use the new transformer, and estimators than reference those samples from the XML documentation so that in docs.microsoft.com users can copy/paste the sample and have a head-starts. 

Mot of the tests that got added as part of the transformer work are a good start for creating a sample.  

# MLContext Catalogs


Catalog | Total APIs | Samples Owner | Samples Status /   ETA | 
-- | -- | -- | -- |
MLContext.Transforms   (root) | 19 | Senja | Remaining:   4 overrides for the normalizer multicolumn examples | 
MLContext.Transforms.Categorical | 2 | ZeeshanA | Done v1 | 
MLContext.Transforms.Conversion | 6 | Senja | DoneV1 | 
MLContext.Transforms.FeatureSelection | 4 | ZeeshanA | Done v1 | 
MLContext.Transforms.TimeSeries | 4 | Senja | Done V1 | 
MLContext.Transforms.Text | 29 | ZeeshanA | Done  V1 | 
MLContext.Data | 10 | Senja | DoneV1 | 
MLContext.Model   (root) | 4 | ZeeshanS  | DoneV1   | 

# P0+P1 Public API (extension methods) per Catalog 

| MLContext.Transforms (root) | Num Overloads | Documentation | Sample | API Owner |
| --------------------------------  | ------------- | ------------  | -----  | ----- |
CopyColumns| 2 | Yes | 2 Can remove dependency on DatasetUtils. | Zeeshan|
Concatenate| 1 | Yes, needs improvement.| 1 - Can remove dependency on DatasetUtils.| Zeeshan |
DropColumns| 1 | Yes| 1 Can remove dependency on DatasetUtils.|Zeeshan |
SelectColumns|2 | Yes, needs improvement. | 2 - Can remove dependency on DatasetUtils.|Zeeshan |
Normalize| 1 | Done. | 1  #3244 |Ivan|
CustomMapping | 1 | Yes, needs improvement. | Done-v1 #3275| Artidoro |
IndicateMissingValues |  2|  |  Done-v1 #3275 | Artidoro | 
ReplaceMissingValues  | 2 |  | Done-v1 #3275 | Artidoro | 
ConvertToGrayscale  | 1 |  Yes, needs fixes. Example not displaying.| 1 #3165 | Abhishek | 
LoadImages  | 1 | Yes, needs fixes. Example not displaying. | 1 #3165 | Abhishek | 
ExtractPixels | 2 | Yes, needs fixes. Example not displaying. | 1 #3165 | Abhishek | 
ResizeImages | 2 | Yes. Example not displaying. | 1 #3165 | Abhishek  |
ConvertToImage  | 2 | Yes. | 1 #3165 | Abhishek  | 
IidChangePointEstimator  | 1 |  | 1- Done | Senja| 
IidSpikeEstimator | 1 |  |  1 - Done| Senja | 
SsaChangePointEstimator  | 1 |  | 1 - Done | Senja | 
SsaSpikeEstimator  | 1 |  | 1 - Done | Senja| 
ApplyOnnxModel |  3| DoneV1 | #3349 | Gani | 
DnnFeaturizeImage | 1 | Yes, needs improvement. | 1 - Done | Senja | 
NormalizeGlobalContrast| 1 | Done | 0 #3232 | Ivan| 
NormalizeLpNorm| 1 | Done. | 0  #3232| Ivan | 
ApproximatedKernelMap| 1 | Done | 0 #3232 | Ivan | 
mlContext.Transforms. CalculateFeatureContribution | 1 | Yes, needs improvement | Rogan

| MLContext.Transforms.Categorical | Num Overloads | Documentation | Sample | API Owner |
| --------------------------------  | ------------- | ------------  | -----  | ----- |
| OneHotEncoding | 2 |  | 2 #3179  | Abhishek |
| OneHotHashEncoding | 2 |  |  2 #3179  | Abhishek |
|  |  |  |  |  |


| MLContext.Transforms.Conversion | Num Overloads | Documentation | Sample | API Owner |
| --------------------------------  | ------------- | ------------  | -----  | ----- |
| Hash | 2 | can't find the API | Done | Senja |
| ConvertType | 2 | Yes, needs improvement. |  Done  | Senja |
| MapKeyToValue |  2 | Yes, needs improvement. | Done  |  Senja |
| MapKeyToVector | 2 | Yes, needs improvement. | Done  |  Senja |
| MapValueToKey | 2 | Yes. | Done  |  Senja |
| MapKeyToBinaryVector | 2 | Yes, needs improvement. | Done  | Senja |

| MLContext.Transforms.FeatureSelection | Num Overloads | Documentation | Sample | API Owner |
| --------------------------------  | ------------- | ------------  | -----  | ----- |
| SelectFeaturesBasedOnMutualInformation  |  2  | need a better example to show MI computation. something like [this  ](https://www.researchgate.net/post/How_can_i_calculate_Mutual_Information_theory_from_a_simple_dataset)|  2 #3184   | Abhishek |
| SelectFeaturesBasedOnCount |  2  |  | 2 #3184  | Abhishek |
|  |  |  |  |  |

| MLContext.Transforms.Text| Num Overloads | Documentation | Sample | API Owner |
| --------------------------------  | ------------- | ------------  | -----  | ----- |
| FeaturizeText |  2  |  | #3120 | Zeeshan |
| TokenizeCharacters |  1  |  | #3123 |Zeeshan |
| NormalizeText |  1  |  |  #3133| Zeeshan |
| ExtractWordEmbeddings |  1  |  | #3142 | Zeeshan |
| TokenizeWords |  1  |  | #3156 | Zeeshan |
| ProduceNgrams |  3  |  | #3177 | Zeeshan |
| RemoveDefaultStopWords |  2  |  | #3156 |  Zeeshan|
| RemoveStopWords |  2  |  | #3156 |Zeeshan  |
| ProduceWordBags |  3  |  | #3183 | Zeeshan |
| ProduceHashedWordBags |  3  |  | #3183 |  Zeeshan|
| ProduceHashedNgrams |  3  |  | #3177 | Zeeshan |
| LatentDirichletAllocation |  2  |  |#3191  | Zeeshan |

For the Data catalog, all API's documentations needs to be augmented with suggestions for when would one use this API.

| MLContext.Data | Num Overloads | Documentation | Sample | API Owner |
| --------------------------------  | ------------- | ------------  | -----  | ----- |
| LoadFromEnumerable | 1 | Done.| 1 - Done. | Senja  |
| CreateEnumerable | 2 | Done. The second overload of this API is a P4 scenario. the use case for that API would be: users has a model which has slot names preserved for the features, and when they load the models, they also get the schema out of the loaded model and pass that schema, together with the TRow type they want to load the data to this API. This API will then populate the Annotations (former metadata) for the feature column. | 1 | Senja  |
| BootstrapSample | 1 | Done. | 1 - Done.  | Senja  |
| Cache | 1 | Done. | 1 - Done. | Senja  |
| FilterRowsByColumn | 1 | Done.| 1 - Done. | Senja  |
| FilterRowsByKeyColumnFraction | 1 | Done. | 1 - Done. | Senja  | 
| FilterRowsByMissingValues | 1 | Done. | 1 - Done. | Senja  | 
| ShuffleRows | 1 | Done. | 1 - Done. | Senja  | 
| SkipRows | 1 | Done. | 1 - Done. | Senja  | 
| TakeRows | 1 | Done. | 1 - Done.| Senja | 

| Other | Num Overloads | Documentation | Sample | API Owner |
| --------------------------------  | ------------- | ------------  | -----  | ----- |
| Permutation Feature Importance | 4 |  Yes, but needs work | Yes, but needs work | Rogan |


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

API reference - Samples for Transforms #1209

MLContext Catalogs

P0+P1 Public API (extension methods) per Catalog

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Catalog	Total APIs	Samples Owner	Samples Status / ETA
MLContext.Transforms (root)	19	Senja	Remaining: 4 overrides for the normalizer multicolumn examples
MLContext.Transforms.Categorical	2	ZeeshanA	Done v1
MLContext.Transforms.Conversion	6	Senja	DoneV1
MLContext.Transforms.FeatureSelection	4	ZeeshanA	Done v1
MLContext.Transforms.TimeSeries	4	Senja	Done V1
MLContext.Transforms.Text	29	ZeeshanA	Done V1
MLContext.Data	10	Senja	DoneV1
MLContext.Model (root)	4	ZeeshanS	DoneV1

MLContext.Transforms (root)	Num Overloads	Documentation	Sample	API Owner
CopyColumns	2	Yes	2 Can remove dependency on DatasetUtils.	Zeeshan
Concatenate	1	Yes, needs improvement.	1 - Can remove dependency on DatasetUtils.	Zeeshan
DropColumns	1	Yes	1 Can remove dependency on DatasetUtils.	Zeeshan
SelectColumns	2	Yes, needs improvement.	2 - Can remove dependency on DatasetUtils.	Zeeshan
Normalize	1	Done.	1 #3244	Ivan
CustomMapping	1	Yes, needs improvement.	Done-v1 #3275	Artidoro
IndicateMissingValues	2		Done-v1 #3275	Artidoro
ReplaceMissingValues	2		Done-v1 #3275	Artidoro
ConvertToGrayscale	1	Yes, needs fixes. Example not displaying.	1 #3165	Abhishek
LoadImages	1	Yes, needs fixes. Example not displaying.	1 #3165	Abhishek
ExtractPixels	2	Yes, needs fixes. Example not displaying.	1 #3165	Abhishek
ResizeImages	2	Yes. Example not displaying.	1 #3165	Abhishek
ConvertToImage	2	Yes.	1 #3165	Abhishek
IidChangePointEstimator	1		1- Done	Senja
IidSpikeEstimator	1		1 - Done	Senja
SsaChangePointEstimator	1		1 - Done	Senja
SsaSpikeEstimator	1		1 - Done	Senja
ApplyOnnxModel	3	DoneV1	#3349	Gani
DnnFeaturizeImage	1	Yes, needs improvement.	1 - Done	Senja
NormalizeGlobalContrast	1	Done	0 #3232	Ivan
NormalizeLpNorm	1	Done.	0 #3232	Ivan
ApproximatedKernelMap	1	Done	0 #3232	Ivan
mlContext.Transforms. CalculateFeatureContribution	1	Yes, needs improvement	Rogan

MLContext.Transforms.Categorical	Num Overloads	Sample	API Owner
OneHotEncoding	2	2 #3179	Abhishek
OneHotHashEncoding	2	2 #3179	Abhishek

MLContext.Transforms.Conversion	Num Overloads	Documentation	Sample	API Owner
Hash	2	can't find the API	Done	Senja
ConvertType	2	Yes, needs improvement.	Done	Senja
MapKeyToValue	2	Yes, needs improvement.	Done	Senja
MapKeyToVector	2	Yes, needs improvement.	Done	Senja
MapValueToKey	2	Yes.	Done	Senja
MapKeyToBinaryVector	2	Yes, needs improvement.	Done	Senja

MLContext.Transforms.FeatureSelection	Num Overloads	Documentation	Sample	API Owner
SelectFeaturesBasedOnMutualInformation	2	need a better example to show MI computation. something like this	2 #3184	Abhishek
SelectFeaturesBasedOnCount	2		2 #3184	Abhishek

MLContext.Transforms.Text	Num Overloads	Sample	API Owner
FeaturizeText	2	#3120	Zeeshan
TokenizeCharacters	1	#3123	Zeeshan
NormalizeText	1	#3133	Zeeshan
ExtractWordEmbeddings	1	#3142	Zeeshan
TokenizeWords	1	#3156	Zeeshan
ProduceNgrams	3	#3177	Zeeshan
RemoveDefaultStopWords	2	#3156	Zeeshan
RemoveStopWords	2	#3156	Zeeshan
ProduceWordBags	3	#3183	Zeeshan
ProduceHashedWordBags	3	#3183	Zeeshan
ProduceHashedNgrams	3	#3177	Zeeshan
LatentDirichletAllocation	2	#3191	Zeeshan

MLContext.Data	Num Overloads	Documentation	Sample	API Owner
LoadFromEnumerable	1	Done.	1 - Done.	Senja
CreateEnumerable	2	Done. The second overload of this API is a P4 scenario. the use case for that API would be: users has a model which has slot names preserved for the features, and when they load the models, they also get the schema out of the loaded model and pass that schema, together with the TRow type they want to load the data to this API. This API will then populate the Annotations (former metadata) for the feature column.	1	Senja
BootstrapSample	1	Done.	1 - Done.	Senja
Cache	1	Done.	1 - Done.	Senja
FilterRowsByColumn	1	Done.	1 - Done.	Senja
FilterRowsByKeyColumnFraction	1	Done.	1 - Done.	Senja
FilterRowsByMissingValues	1	Done.	1 - Done.	Senja
ShuffleRows	1	Done.	1 - Done.	Senja
SkipRows	1	Done.	1 - Done.	Senja
TakeRows	1	Done.	1 - Done.	Senja

API reference - Samples for Transforms #1209

Description

MLContext Catalogs

P0+P1 Public API (extension methods) per Catalog

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions