Closed
Description
Issue
I'm trying to use other MulticlassClassification trainers but never succeed. The only one succeeded is StochasticDualCoordinateAscent. If i change to LogisticRegression or NaiveBayes, there will always be a error "System.ArgumentOutOfRangeException: 'Schema mismatch for label column '': expected Key, got R4".
MultiData.cs
public class MultiData
{
[LoadColumn(0)]
public string DataValue { get; set; }
[LoadColumn(1)]
public float Label { get; set; }
}
MultiDataPrediction.cs
public class MultiDataPrediction
{
public float[] Score { get; set; }
}
BuildTrainEvaluateAndSaveModel() function
// STEP 1: Common data loading configuration
IDataView trainingDataView = mlContext.Data.ReadFromTextFile<MultiData>(TrainMultiDataPath1, hasHeader: false);
IDataView testDataView = mlContext.Data.ReadFromTextFile<MultiData>(TestMultiDataPath, hasHeader: false);
// STEP 2: Common data process configuration with pipeline data transformations
var dataProcessPipeline = mlContext.Transforms.Text.FeaturizeText(outputColumnName: DefaultColumnNames.Features, inputColumnName: nameof(MultiData.DataValue))
.Append(mlContext.Transforms.Text.NormalizeText("NormalizedData", nameof(MultiData.DataValue)))
.Append(mlContext.Transforms.Text.TokenizeCharacters("DataChars", "NormalizedData"))
.Append(new NgramExtractingEstimator(mlContext, "BagOfTrichar", "DataChars",
ngramLength: 3, weighting: NgramExtractingEstimator.WeightingCriteria.TfIdf));
// (OPTIONAL) Peek data (such as 2 records) in training DataView after applying the ProcessPipeline's transformations into "Features"
//ConsoleHelper.PeekDataViewInConsole<MultiData>(mlContext, trainingDataView, dataProcessPipeline, 2);
//ConsoleHelper.PeekVectorColumnDataInConsole(mlContext, DefaultColumnNames.Features, trainingDataView, dataProcessPipeline, 1);
// STEP 3: Set the training algorithm, then create and config the modelBuilder
var trainer = mlContext.MulticlassClassification.Trainers.NaiveBayes(labelColumn: nameof(MultiData.Label), featureColumn: DefaultColumnNames.Features);
var trainingPipeline = dataProcessPipeline.Append(trainer);
// STEP 4: Train the model fitting to the DataSet
Console.WriteLine("=============== Training the model ===============");
ITransformer trainedModel = trainingPipeline.Fit(trainingDataView);
Remark:
Even I change the type of the MultiData.Label to UInt32 will not be working as well.
With Error, "System.ArgumentOutOfRangeException: 'Schema mismatch for label column '': expected Key, got U4"
Activity
Ivanidzo4ka commentedon Feb 20, 2019
related to ##2628
darren-zdc commentedon Feb 20, 2019
Thanks for your reply!!
I solve it by adding
.Append(mlContext.Transforms.Conversion.MapValueToKey(outputColumnName: DefaultColumnNames.Label, inputColumnName: nameof(MultiData.Label)));
Maybe should add this line in all the MultiClass Classification samples, since all the samples are using SDCA, and SDCA will actually auto doing the keyMapping. That will be excellent for all the new learners~