Skip to content

Rename mlContext.Data.TextReader() to mlContext.Data.CreateTextLoader() #1690

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
CESARDELATORRE opened this issue Nov 21, 2018 · 11 comments
Closed
Assignees
Labels
API Issues pertaining the friendly API

Comments

@CESARDELATORRE
Copy link
Contributor

Since the object being created is a "TextLoader", I'd recommend the method's name to say so, instead of "TextReader" which looks like a different type.

I mean code like the following feels confusing:

            TextLoader textLoader = mlContext.Data.TextReader(new TextLoader.Arguments()
                                                    {
                                                        Separator = "tab",
                                                        HasHeader = true,
                                                        Column = new[]
                                                                    {
                                                                    new TextLoader.Column("Label", DataKind.Bool, 0),
                                                                    new TextLoader.Column("Text", DataKind.Text, 1)
                                                                    }
                                                    });

In addition to that, methods should have a verb as part of the name, so if what it is doing is to create a TextReader, let's say so, as:

mlContext.Data.CreateTextLoader()

We're not being consistent now when having some methods with a verb but other methods just a noun:

image

It is also against C# conventions (and most languages). A method's name should have a verb describing the action being performed by that method:

https://docs.microsoft.com/en-us/dotnet/standard/design-guidelines/names-of-type-members#names-of-methods

@Zruty0
Copy link
Contributor

Zruty0 commented Nov 21, 2018

Most of our extension methods to MLContext are factory methods that create estimators.

Notable exceptions are Save/Load in MLContext.Data, and TrainTestSplit/CrossValidate in the learning tasks.

I think that trying to prefix every factory method with Create just to differentiate the 50 factory methods from the other 4-6 non-factory methods is quite expensive.

In addition, I would like to ask: are we going to ever stop renaming things?
#1318 has been around for a month now, and @sfilipi has merged 16 PRs to make it happen. Is now a good time to review the naming? I hoped we would be done with it by now.

@CESARDELATORRE
Copy link
Contributor Author

Just saying it looks like Property objects instead of methods as it is against the C# conventions (and most languages). A method's name should have a verb describing the action being performed by that method:

https://docs.microsoft.com/en-us/dotnet/standard/design-guidelines/names-of-type-members#names-of-methods

If we don't want to rename, it is an option, but for sure it'd be better to do it now when we are still in Preview than in the future or after v1.0 is released.

@shmoradims shmoradims added the API Issues pertaining the friendly API label Nov 26, 2018
@shmoradims shmoradims self-assigned this Nov 26, 2018
@shmoradims
Copy link

@eerhardt @TomFinley @Zruty0 could you please discuss if we're doing this? If it's "won't fix", please close it.

@shmoradims shmoradims added the need info This issue needs more info before triage label Nov 26, 2018
@eerhardt
Copy link
Member

I agree that method names should be verbs (as the .NET design guidelines state).

This guideline doesn't seem to be followed consistently in the API:

verbs

  • mlContext.Transforms.Categorical.MapValueToKey("Label")
  • mlContext.Data.ReadFromTextFile(new TextLoader.Column[]
  • context.Transforms.Normalize()

nouns

  • context.Data.TextReader
  • context.Transforms.CustomMappingTransformer
  • context.Transforms.Categorical.OneHotEncoding - I guess this one is technically a "gerund".

When looking at our Transforms methods as a whole, most tend to use verb names - Concatenate, CopyColumns, SelectColumns, ReplaceMissingValues, ConvertTo, MapXxToYy, IndicateMissingValues, etc.

We even chose one to be named Transforms.Projection.CreateRandomFourierFeatures to explicitly make it a verb.

As far as Transforms go, I only see the above two using noun names.

Instead of CreateTextReader, we could change it to be ReadText, LoadText, or similar, to make it a verb.

Looking at the other 2 methods in the same class we have:

    public static class TextLoaderSaverCatalog
    {
        public static IDataView ReadFromTextFile(this DataLoadSaveOperations catalog, TextLoader.Column[] columns, string path, Action<TextLoader.Arguments> advancedSettings = null);
        public static void SaveAsText(this DataLoadSaveOperations catalog, IDataView data, Stream stream, char separator = '\t', bool headerRow = true, bool schema = true, bool keepHidden = false);
        public static TextLoader TextReader(this DataLoadSaveOperations catalog, TextLoader.Arguments args, IMultiStreamSource dataSample = null);
        public static TextLoader TextReader(this DataLoadSaveOperations catalog, TextLoader.Column[] columns, Action<TextLoader.Arguments> advancedSettings = null, IMultiStreamSource dataSample = null);
    }

2 verbs and one noun.

Now, if we include the learners (or anything deriving from TrainContextBase: BinaryClassification, Regression, Clustering) in the discussion, here we see a bigger mixture of nouns and verbs:

verb

  • context.Regression.CrossValidate
  • context.Regression.Evaluate
  • context.Regression.TrainTestSplit

But everything inside of Trainers appears to be nouns:

  • context.Regression.Trainers.FastTree
  • context.Regression.Trainers.StochasticDualCoordinateAscent
  • context.BinaryClassification.Trainers.LogisticRegression

We could consider Trainers to be "special", and not needing to conform to the "verb" guideline, as the Create is implicit and the names are already very long. Also, it is easier to name a Transform method as a verb - because you can use the name of what the transform is actually doing - Concatenate, MapXxToYy, SelectFoo, etc. But Trainers names are, by convention, nouns. This is the name someone gave the algorithm being performed. So the only option would be to prepend Create to the name to make it a verb. That may not be as valuable as using verbs everywhere else.

In summary, I agree with @CESARDELATORRE that the method context.Data.TextReader does feel completely out of place with the rest of the APIs we have. We should consider renaming it to be a verb. I believe it is a separate issue to rename any Trainers methods, if we want to.

@TomFinley
Copy link
Contributor

I'd certainly like the names to be reconciled. In a case as @eerhardt points out where everything would get a prefix like Make (e.g., trainers), I'd probably prefer to drop it as he suggests since that doesn't seem terribly helpful.

Regarding the specific name of the main thing that sparked the larger discussion, IIRC, was there not feedback from .NET team that the phrase IDataReader was confusing, and would be better as IDataSource? If that is true, then we'd rename the implementing class something like TextSource or somesuch, then we'd have method like MakeTextSource or somesuch?

@artidoro
Copy link
Contributor

I wanted to point out another naming problem regarding MLContext extensions:

Extensions to MLContext don't consistently match either the static API names, or the dynamic API names for estimators. This leads users that directly instantiate the dynamic API to find different names for the same estimators than when finding them through MLContext.

Since the estimators obtained through MLContext use the dynamic API, I would suggest that their names match the dynamic API names.

@eerhardt
Copy link
Member

In the API design review meeting today we decided to add a Create prefix to the methods that weren't already verbs - (like TextReader, and LogisticRegression) - and in general to have verbs for names of methods.

@shmoradims
Copy link

Great, we have concrete action item. Removing myself and 'need-info'

@shmoradims shmoradims removed the need info This issue needs more info before triage label Nov 30, 2018
@shmoradims shmoradims removed their assignment Nov 30, 2018
@najeeb-kazmi
Copy link
Member

MLContext.Data.TextLoader was renamed to MLContext.Data.CreateTextLoader in #1784. Closing this issue.

@CESARDELATORRE
Copy link
Contributor Author

Reopening this issue. Not critical, but still feels non consistent with the returned type.
I see that this method was renamed in 0.9, but was renamed to CreateTextReader instead of CreateTextLoader. Sample code like here:

TextLoader textLoader = mlContext.Data.CreateTextReader(new TextLoader.Arguments()

Why is it not aligned to the type which is "Loader"?

I think it should be "CreateTextLoader" (instead of CreateTextReader) since it is returning a TextLoader object type, right?

The other way around would be to rename the "TextLoader" type to "TextReader", but being consistent on naming between types and methods.

@CESARDELATORRE CESARDELATORRE reopened this Jan 9, 2019
@najeeb-kazmi najeeb-kazmi self-assigned this Jan 11, 2019
@najeeb-kazmi
Copy link
Member

@CESARDELATORRE You are right - I missed the Loader/Reader part. I'll send a PR for this.

@ghost ghost locked as resolved and limited conversation to collaborators Mar 26, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
API Issues pertaining the friendly API
Projects
None yet
Development

No branches or pull requests

7 participants