Skip to content

Don't use the default column names in the new API #1360

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sfilipi opened this issue Oct 24, 2018 · 2 comments
Closed

Don't use the default column names in the new API #1360

sfilipi opened this issue Oct 24, 2018 · 2 comments
Labels
API Issues pertaining the friendly API enhancement New feature or request

Comments

@sfilipi
Copy link
Member

sfilipi commented Oct 24, 2018

The usage of the default column names is more of a source of trouble than beneficial in my opinion.
Providing defaults for the numeric values is one think - we know the algorithms, and what ranges might work best for most datasets, and we also want to give a guideline on their range.

The columns are unlikely to be called what ML.Net calls them, across datasets, and it is easy to omit them from the signature when they are set to defaults.

Consider this pipeline:

 var pipeline = mlContext.Transforms.Text.FeaturizeText("SentimentText", "Features")
                    .Append(mlContext.BinaryClassification.Trainers.StochasticDualCoordinateAscent(label: "Sentiment", features: "Features", l2Const: 0.001f));

            // Step 3: Run Cross-Validation on this pipeline, and dataFile.
            var cvResult = mlContext.BinaryClassification.CrossValidate(data, pipeline);

without specifying the label on CrossValidate

var cvResult = mlContext.BinaryClassification.CrossValidate(data, pipeline, labelColumn: "Sentiment");
this will fail with message: 'Label column 'Label' not found'
which requires some level of looking aroudn to eventually figure out the mismatch between your data and the defaults on CV.
Why push that to the users, when we can just guide them towards providing the right names where the apis need them.

cc @Zruty0 @shauheen @GalOshri @TomFinley for opinions

@sfilipi sfilipi added enhancement New feature or request API Issues pertaining the friendly API labels Oct 24, 2018
@Zruty0
Copy link
Contributor

Zruty0 commented Oct 24, 2018

I think that, regardless of what we do, we should be consistent between components.
Either all trainers have defaults for all columns (like label, features, weight), or none do.

Frankly, I think having default names is just fine: I believe that the users don't often have 'inherent' names to their columns, they are forced to give them names just because that's how our data views work. In this case, there is no incentive NOT to use the default names, or to force them to be specified multiple times.

However, I welcome other points of view

@Ivanidzo4ka
Copy link
Contributor

Shall we close it?

@codemzs codemzs closed this as completed Jun 30, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Mar 27, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
API Issues pertaining the friendly API enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants