Skip to content

Naming overhaul for IDataView subsystem #2297

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
eerhardt opened this issue Jan 29, 2019 · 8 comments · Fixed by #2712
Closed

Naming overhaul for IDataView subsystem #2297

eerhardt opened this issue Jan 29, 2019 · 8 comments · Fixed by #2712
Assignees
Labels
API Issues pertaining the friendly API
Milestone

Comments

@eerhardt
Copy link
Member

eerhardt commented Jan 29, 2019

We should make sure the type names in the IDataView subsystem are the names we want to use forever.

See

  1. Extract IDataView into its own assembly and NuGet package #2220 (comment)

If all of these types are column types, should they include column in the name? e.g. TextColumnType?

  1. Move IDataView to Microsoft.Data.DataView namespace. #2254 (comment)

The idea was broached I know, but resolution never reached: should we rename? Or else perhaps make Schema a nested class of something appropriate, so as to give it definition?

We could rename the class to DataViewSchema, or ViewSchema.
We’d probably want to rename everything then. Row is a pretty generic name. So is ColumnType.


Proposal

  1. We will ensure the class names in the Microsoft.Data.DataView package are unique by adding a prefix to any name deemed too general.
    • Currently we have 2 options for the prefix:
      • DataView, ex. DataViewSchema, DataViewRow, DataViewType.
      • DV, ex. DVSchema, DVRow, DVType.
        • Note: this is following the Db prefix pattern in System.Data: DbDataReader, DbConnection, DbColumn, etc.
        • It could be argued to lower-case the v: DvSchema to follow the same pattern above.
  2. Add the base type suffix to all types that derive from the current ColumnType class, using whatever name we decide for the base ColumnType class. For example, assuming we rename ColumnType => DataViewType, we would have:
    • NumberDataViewType
    • TextDataViewType
    • BooleanDataViewType
    • etc
  3. Rename the properties on NumberType to match the type name in System. namespace.
    • ex. use Int16 instead of I2. use Single instead of R4.
    • For the UG property and corresponding struct RowId, use whatever we rename the Row type, followed by Id. Example, DataViewRowId.
  4. Rename the Schema.Metadata class to Schema.Annotations, and rename the property on Schema.Column from Metadata to Annotations.
  5. Move the Builder classes to be nested under the types they build, following the same pattern as System.Collections.Immutable.
@eerhardt eerhardt added the API Issues pertaining the friendly API label Jan 29, 2019
@eerhardt
Copy link
Member Author

eerhardt commented Jan 29, 2019

We may also want to rename the static properties on NumberType:

NumberType.I2
NumberType.U4
NumberType.R4

should all match the .NET type names:

NumberType.Int16
NumberType.UInt32
NumberType.Single

Especially NumberType.UG should be renamed to NumberType.RowId or similar.

@eerhardt
Copy link
Member Author

Also related is: #1843

@TomFinley
Copy link
Contributor

We should make sure the type names in the IDataView subsystem are the names we want to use forever.

See

  1. #2220 (comment)

If all of these types are column types, should they include column in the name? e.g. TextColumnType?

That's one possibility. Before we go that route though, I'd like to explore whether the name "column type" is meaningful -- it was back when the only things did was describe the types of columns in data views, but that hasn't been the case for a while.

  1. #2254 (comment)

The idea was broached I know, but resolution never reached: should we rename? Or else perhaps make Schema a nested class of something appropriate, so as to give it definition?

We could rename the class to DataViewSchema, or ViewSchema.
We’d probably want to rename everything then. Row is a pretty generic name. So is ColumnType.

Let's explore that. If we embraced this idea of DataViewSchema, then possibly DataViewType and DataViewRow may also be appropriate. The idea though of something named TextDataViewType though wearies me; that is an obnoxious name.

It is also potentially confusing, since we are using "data view" in a way that is distinct from the actual type: that is, pertaining to data-view-interacting objects, as opposed to data-views themselves. Since, these structures are used in contexts other than data views directly (though ultimately, one way or another, they are all feedable back to or from dataviews.)

Also if I look at other libraries it is not clear to me what we should do. Spark clearly gets by just fine with names that would certainly be considered by the .NET team's standards way too generic. That by itself is not really an argument, but the reality is we haven't really heard a complaint yet about ISchema, IRow or its successor Schema or Row. Just theoretical concerns that it might be a problem.

ADO.NET, if I look for similar structures, solved the problem by prefixing everything with Db. This is a violation of .NET naming guidelines, but if the .NET naming guidelines lead us to instead prefix everything with DataView, I might prefer actually something more like Dv. (This is somewhat reminiscent of old types, like DvInt, DvText, and so on.)

So my opinions are not very strong one way or the other. It might be one of these things where it becomes more clear that something is acceptable or unacceptable once a solid proposal is made. I just don't know yet for myself.

But of course as we say, on the subject of types names we've reached the point where it is, speak now or forever hold your peace.

@danmoseley
Copy link
Member

For reference, a dump of the public surface of the DataView nupkg is here
https://gist.github.com/danmosemsft/98b0dd3d33a0283a6fe71e46ff22c929

@eerhardt
Copy link
Member Author

eerhardt commented Feb 5, 2019

I have updated the original comment with a proposal. Please take a look and provide any feedback (especially please weigh-in on the options for the prefix for (1)).

/cc @terrajobst @NiklasGustafsson @KrzysztofCwalina @ericstj @stephentoub

@terrajobst
Copy link

terrajobst commented Feb 7, 2019

Should we meet to discuss this? Back and forth on GitHub might not be the best way to close on this. I'm happy to set aside some time. I think a one-hour meeting should get us close.

@eerhardt
Copy link
Member Author

eerhardt commented Feb 7, 2019

Here is a dump of the proposed public surface after the changes above.

https://gist.github.com/eerhardt/eff7e207673e1fde40a5696eb07ba273

I'll book some time to close on this, @terrajobst.

@eerhardt
Copy link
Member Author

We had a meeting and approved the proposal with the following changes:

  • On the Builder classes, rename GetAnnotations and GetSchema to ToAnnotations and ToSchema to follow the immutable collections pattern (and StringBuilder).
  • Remove anything from DataViewRowId that makes it seem like number.
    • Remove comparison operators that aren’t just equality.
    • Move it from NumberDataViewType into its own class: DataViewRowIdDataViewType ? RowIdDataViewType? I guess we didn’t discuss the name of this type.

eerhardt added a commit to eerhardt/machinelearning that referenced this issue Feb 23, 2019
…o follow the immutable collections pattern (and StringBuilder).

Working towards dotnet#2297
eerhardt added a commit that referenced this issue Feb 24, 2019
* Move MetadataBuilder to be DataViewSchema.Metadata.Builder.

* Move SchemaBuilder to DataViewSchema.Builder.

* Rename `GetMetadata` and `GetSchema` to `ToMetadata` and `ToSchema` to follow the immutable collections pattern (and StringBuilder).

Working towards #2297
eerhardt added a commit to eerhardt/machinelearning that referenced this issue Feb 25, 2019
- Remove it from the NumberDataViewType.
- Remove any method/operator that makes it feel like a number.

Working towards dotnet#2297
eerhardt added a commit to eerhardt/machinelearning that referenced this issue Feb 25, 2019
- Remove it from the NumberDataViewType.
- Remove any method/operator that makes it feel like a number.

Working towards dotnet#2297
eerhardt added a commit to eerhardt/machinelearning that referenced this issue Feb 25, 2019
eerhardt added a commit to eerhardt/machinelearning that referenced this issue Feb 25, 2019
eerhardt added a commit that referenced this issue Feb 25, 2019
* Make DataViewRowId not act like a number.

- Remove it from the NumberDataViewType.
- Remove any method/operator that makes it feel like a number.

Working towards #2297
@shauheen shauheen added this to the 0219 milestone Feb 25, 2019
eerhardt added a commit to eerhardt/machinelearning that referenced this issue Feb 26, 2019
eerhardt added a commit that referenced this issue Feb 26, 2019
* Rename DataView Metadata to Annotations.

Fix #1843
Fix #2297
@ghost ghost locked as resolved and limited conversation to collaborators Mar 25, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
API Issues pertaining the friendly API
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants