Skip to content

Value-tuple stragglers in the public API #2881

Closed
@TomFinley

Description

@TomFinley

There was a prior PR #2581 and issue #2501 related to value-tuples and why they should not be part of our public surface. I have noticed that there are some "stragglers" still remaining in the public API. So, the work is perhaps not yet complete.

The following list is I believe complete for Core/Data/Transforms/FastTree/ImageAnalytics/KMeansClustering/LightGBM/PCA/Tensorflow/StandardLearners/Data.DataView assemblies.

There are three distinct categories where this flaw has remained. (Though the last "category" other has only one item.)

Properties on transformers

Some transformers are exposing information about themselves via this mechanism.

  • KeyToBinaryVectorMappingTransformer.Columns
  • MissingValueDroppingTransformer.Columns
  • MissingValueIndicatorTransformer.Columns
  • CustomStopWordsRemovingTransformer.Columns
  • TextNormalizingTransformer.Columns
  • TokenizingByCharactersTransformer.Columns
  • WordEmbeddingsExtractingTransformer.Columns
  • ImageGrayscalingTransformer.Columns
  • ImageLoadingTransformer.Columns
  • LatentDirichletAllocationTransformer.ItemScoresPerTopic and WordScoresPerTopic

MLContext estimator creation extension methods

There are some overloads of MLContext extension methods on various catalogs that are stil using it. I view this as a lesser sin since this is at least something that could conceivably be fixed using an overload if we decide it is necessary, but I'd still prefer to be consistent.

  • ProduceHashedNgrams extension method
  • ProduceHashedWordBags extension method
  • ProduceNgrams extension method
  • ProduceWordBags extension method
  • RemoveDefaultStopWords extension method
  • TokenizeWords extension method

Others

Lastly, I see a Microsoft.ML.ColumnOptions global class with an implicit operator from value-tuples. This one is probably harmless, since that specific class is for representing a simple case.

/cc @yaeldekel @Ivanidzo4ka

Metadata

Metadata

Assignees

Labels

APIIssues pertaining the friendly API

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions