-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Closed
Labels
APIIssues pertaining the friendly APIIssues pertaining the friendly APIP1Priority of the issue for triage purpose: Needs to be fixed soon.Priority of the issue for triage purpose: Needs to be fixed soon.classificationBugs related classification tasksBugs related classification taskslightgbmBugs related lightgbmBugs related lightgbmusabilitySmoothing user interaction or experienceSmoothing user interaction or experience
Description
@justinormont points out (https://github.com/dotnet/machinelearning-automl/issues/255) :
Key
type is needed for binary classification learners:
- Dataset w/ text labels (as seen here)
- Datasets w/ missing labels --
BL
no longer supports NA (changed in dotnet/machinelearning#673)
When the "Label" column is text, calling
var pipeline = mlContext.Transforms.Conversion.MapValueToKey("Label", "Label");
var trainer = mlContext.BinaryClassification.Trainers.LightGbm(labelColumnName: "Label", featureColumnName: "Features");
var trainingPipeline = pipeline.Append(trainer);
var crossValidationResults = mlContext.BinaryClassification.CrossValidateNonCalibrated(trainingDataView, trainingPipeline, numFolds: 5, labelColumn: "Label");
results in the exception
System.ArgumentOutOfRangeException
HResult=0x80131502
Message=Schema mismatch for label column '': expected Bool, got Key<U4>
Source=Microsoft.ML.Data
StackTrace:
at Microsoft.ML.Trainers.TrainerEstimatorBase`2.CheckLabelCompatible(Column labelCol)
at Microsoft.ML.Trainers.TrainerEstimatorBase`2.CheckInputSchema(SchemaShape inputSchema)
at Microsoft.ML.Trainers.TrainerEstimatorBase`2.GetOutputSchema(SchemaShape inputSchema)
at Microsoft.ML.Data.EstimatorChain`1.GetOutputSchema(SchemaShape inputSchema)
at Microsoft.ML.Data.EstimatorChain`1.Fit(IDataView input)
at Microsoft.ML.TrainCatalogBase.<>c__DisplayClass7_0.<CrossValidateTrain>b__0(Int32 fold)
at Microsoft.ML.TrainCatalogBase.CrossValidateTrain(IDataView data, IEstimator`1 estimator, Int32 numFolds, String samplingKeyColumn, Nullable`1 seed)
at Microsoft.ML.BinaryClassificationCatalog.CrossValidateNonCalibrated(IDataView data, IEstimator`1 estimator, Int32 numFolds, String labelColumn, String samplingKeyColumn, Nullable`1 seed)
at DogFruitNLP_14KB_735_rows_BinaryClassification.Program.BuildTrainEvaluateAndSaveModel(MLContext mlContext) in C:\AutoMLDotNet\bin\AnyCPU.Debug\mlnet\netcoreapp2.1\DogFruitNLP_14KB_735_rows_BinaryClassification\Program.cs:line 74
Would you have any recommendation for handling these kinds of scenarios?
Metadata
Metadata
Assignees
Labels
APIIssues pertaining the friendly APIIssues pertaining the friendly APIP1Priority of the issue for triage purpose: Needs to be fixed soon.Priority of the issue for triage purpose: Needs to be fixed soon.classificationBugs related classification tasksBugs related classification taskslightgbmBugs related lightgbmBugs related lightgbmusabilitySmoothing user interaction or experienceSmoothing user interaction or experience