Skip to content

Don't use the default column names in the new API #1360

Closed
@sfilipi

Description

@sfilipi

The usage of the default column names is more of a source of trouble than beneficial in my opinion.
Providing defaults for the numeric values is one think - we know the algorithms, and what ranges might work best for most datasets, and we also want to give a guideline on their range.

The columns are unlikely to be called what ML.Net calls them, across datasets, and it is easy to omit them from the signature when they are set to defaults.

Consider this pipeline:

 var pipeline = mlContext.Transforms.Text.FeaturizeText("SentimentText", "Features")
                    .Append(mlContext.BinaryClassification.Trainers.StochasticDualCoordinateAscent(label: "Sentiment", features: "Features", l2Const: 0.001f));

            // Step 3: Run Cross-Validation on this pipeline, and dataFile.
            var cvResult = mlContext.BinaryClassification.CrossValidate(data, pipeline);

without specifying the label on CrossValidate

var cvResult = mlContext.BinaryClassification.CrossValidate(data, pipeline, labelColumn: "Sentiment");
this will fail with message: 'Label column 'Label' not found'
which requires some level of looking aroudn to eventually figure out the mismatch between your data and the defaults on CV.
Why push that to the users, when we can just guide them towards providing the right names where the apis need them.

cc @Zruty0 @shauheen @GalOshri @TomFinley for opinions

Metadata

Metadata

Assignees

No one assigned

    Labels

    APIIssues pertaining the friendly APIenhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions