Skip to content

Train binary classification with text label #2826

@daholste

Description

@daholste

@justinormont points out (https://github.com/dotnet/machinelearning-automl/issues/255) :

Key type is needed for binary classification learners:

  • Dataset w/ text labels (as seen here)
  • Datasets w/ missing labels -- BL no longer supports NA (changed in dotnet/machinelearning#673)

When the "Label" column is text, calling

var pipeline = mlContext.Transforms.Conversion.MapValueToKey("Label", "Label");
var trainer = mlContext.BinaryClassification.Trainers.LightGbm(labelColumnName: "Label", featureColumnName: "Features");
var trainingPipeline = pipeline.Append(trainer);
var crossValidationResults = mlContext.BinaryClassification.CrossValidateNonCalibrated(trainingDataView, trainingPipeline, numFolds: 5, labelColumn: "Label");

results in the exception

System.ArgumentOutOfRangeException
  HResult=0x80131502
  Message=Schema mismatch for label column '': expected Bool, got Key<U4>
  Source=Microsoft.ML.Data
  StackTrace:
   at Microsoft.ML.Trainers.TrainerEstimatorBase`2.CheckLabelCompatible(Column labelCol)
   at Microsoft.ML.Trainers.TrainerEstimatorBase`2.CheckInputSchema(SchemaShape inputSchema)
   at Microsoft.ML.Trainers.TrainerEstimatorBase`2.GetOutputSchema(SchemaShape inputSchema)
   at Microsoft.ML.Data.EstimatorChain`1.GetOutputSchema(SchemaShape inputSchema)
   at Microsoft.ML.Data.EstimatorChain`1.Fit(IDataView input)
   at Microsoft.ML.TrainCatalogBase.<>c__DisplayClass7_0.<CrossValidateTrain>b__0(Int32 fold)
   at Microsoft.ML.TrainCatalogBase.CrossValidateTrain(IDataView data, IEstimator`1 estimator, Int32 numFolds, String samplingKeyColumn, Nullable`1 seed)
   at Microsoft.ML.BinaryClassificationCatalog.CrossValidateNonCalibrated(IDataView data, IEstimator`1 estimator, Int32 numFolds, String labelColumn, String samplingKeyColumn, Nullable`1 seed)
   at DogFruitNLP_14KB_735_rows_BinaryClassification.Program.BuildTrainEvaluateAndSaveModel(MLContext mlContext) in C:\AutoMLDotNet\bin\AnyCPU.Debug\mlnet\netcoreapp2.1\DogFruitNLP_14KB_735_rows_BinaryClassification\Program.cs:line 74

Would you have any recommendation for handling these kinds of scenarios?

Metadata

Metadata

Assignees

No one assigned

    Labels

    APIIssues pertaining the friendly APIP1Priority of the issue for triage purpose: Needs to be fixed soon.classificationBugs related classification taskslightgbmBugs related lightgbmusabilitySmoothing user interaction or experience

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions