Metadata/annotations public surface of the API

Metadata (FYI to be later named annotations per #2297) is an essential mechanism for attaching optional information about columns. This ranges from publicly facing stuff that user's should be aware of (slot names, which we use for feature names on feature columns, and key values), versus a bunch of stuff that is arguably useful for users but primarily for our internal infrastructure (e.g., whether something has already been normalized), versus stuff intended purely for our internal infrastructure.

We ought to decide what we really want to be part of our initial public surface (as small as possible but no smaller), and internalize the rest of it.

So, we will keep as is `Metadata` (to be `Annotations`):

https://github.com/dotnet/machinelearning/blob/a56caeebaaa0076d6940bfdede90a4eb0a351a20/src/Microsoft.Data.DataView/DataViewSchema.cs#L172

This by itself is little more than an arbitrary string/object store, which is as intended. So that will not change. What will change however is the class we've made to make access a little more structured.

https://github.com/dotnet/machinelearning/blob/a56caeebaaa0076d6940bfdede90a4eb0a351a20/src/Microsoft.ML.Core/Data/MetadataUtils.cs#L17

This has stuff that is "good" in that we want to keep it as part of the public surface, but also stuff that is internal and should not be part of the public surface.

## The good

A small amount of this stuff we probably want to keep.

However we should probably move it somewhere else... perhaps, the static class `SchemaColumnAnnotationsExtensions` as a series of extension methods on top of `DataViewSchema.Column` to access the associated metadata.

This might include things like these methods.

https://github.com/dotnet/machinelearning/blob/a56caeebaaa0076d6940bfdede90a4eb0a351a20/src/Microsoft.ML.Core/Data/MetadataUtils.cs#L297

https://github.com/dotnet/machinelearning/blob/a56caeebaaa0076d6940bfdede90a4eb0a351a20/src/Microsoft.ML.Core/Data/MetadataUtils.cs#L321

## The bad

Much of this class though should be internal.

So for example, we have this static class of `Kinds` of metadata. Absolutely
quite nice a thing to have for our own infrastructure for consistency, but this is not what we want to show users. Similar with this sort of labels for types of scorings, which is a scenario irrelevant to the ML.NET API as defined (since people evaluate scores by saying, "here, evaluate these scores" explicitly by calling some code). Also stuff on the ranges of categorical variables which, while essential, are mostly for the benefit of trainers downstream consuming data. (User's that want the raw categoricals can, by just programming, consume the source data themselves, since they control the pipeline.)

There's also a lot of stuff built around *implementing* metadata, which is of questionable worth at this time given the changes that have happened to schema in the past year, and which is of no use whatever.

/cc @Ivanidzo4ka , @eerhardt , @rogancarr , @sfilipi 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metadata/annotations public surface of the API #2622

The good

The bad

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Metadata/annotations public surface of the API #2622

Description

The good

The bad

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions