Description
The usage of the default column names is more of a source of trouble than beneficial in my opinion.
Providing defaults for the numeric values is one think - we know the algorithms, and what ranges might work best for most datasets, and we also want to give a guideline on their range.
The columns are unlikely to be called what ML.Net calls them, across datasets, and it is easy to omit them from the signature when they are set to defaults.
Consider this pipeline:
var pipeline = mlContext.Transforms.Text.FeaturizeText("SentimentText", "Features")
.Append(mlContext.BinaryClassification.Trainers.StochasticDualCoordinateAscent(label: "Sentiment", features: "Features", l2Const: 0.001f));
// Step 3: Run Cross-Validation on this pipeline, and dataFile.
var cvResult = mlContext.BinaryClassification.CrossValidate(data, pipeline);
without specifying the label on CrossValidate
var cvResult = mlContext.BinaryClassification.CrossValidate(data, pipeline, labelColumn: "Sentiment");
this will fail with message: 'Label column 'Label' not found'
which requires some level of looking aroudn to eventually figure out the mismatch between your data and the defaults on CV.
Why push that to the users, when we can just guide them towards providing the right names where the apis need them.
cc @Zruty0 @shauheen @GalOshri @TomFinley for opinions