Skip to content

Failed in using MultiClassClassification trainers other than StochasticDualCoordinateAscent with error "System.ArgumentOutOfRangeException: 'Schema mismatch for label column '': expected Key<U4>, got R4" #2656

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
darren-zdc opened this issue Feb 20, 2019 · 2 comments

Comments

@darren-zdc
Copy link

darren-zdc commented Feb 20, 2019

Issue

I'm trying to use other MulticlassClassification trainers but never succeed. The only one succeeded is StochasticDualCoordinateAscent. If i change to LogisticRegression or NaiveBayes, there will always be a error "System.ArgumentOutOfRangeException: 'Schema mismatch for label column '': expected Key, got R4".

MultiData.cs

public class MultiData
    {
        [LoadColumn(0)]
        public string DataValue { get; set; }
        [LoadColumn(1)]
        public float Label { get; set; }
    }

MultiDataPrediction.cs

public class MultiDataPrediction
    {
        public float[] Score { get; set; }
    }

BuildTrainEvaluateAndSaveModel() function

            // STEP 1: Common data loading configuration
            IDataView trainingDataView = mlContext.Data.ReadFromTextFile<MultiData>(TrainMultiDataPath1, hasHeader: false);
            IDataView testDataView = mlContext.Data.ReadFromTextFile<MultiData>(TestMultiDataPath, hasHeader: false);

            // STEP 2: Common data process configuration with pipeline data transformations          
            var dataProcessPipeline = mlContext.Transforms.Text.FeaturizeText(outputColumnName: DefaultColumnNames.Features, inputColumnName: nameof(MultiData.DataValue))
                .Append(mlContext.Transforms.Text.NormalizeText("NormalizedData", nameof(MultiData.DataValue)))
                .Append(mlContext.Transforms.Text.TokenizeCharacters("DataChars", "NormalizedData"))
                .Append(new NgramExtractingEstimator(mlContext, "BagOfTrichar", "DataChars",
                            ngramLength: 3, weighting: NgramExtractingEstimator.WeightingCriteria.TfIdf));

            // (OPTIONAL) Peek data (such as 2 records) in training DataView after applying the ProcessPipeline's transformations into "Features" 
            //ConsoleHelper.PeekDataViewInConsole<MultiData>(mlContext, trainingDataView, dataProcessPipeline, 2);
            //ConsoleHelper.PeekVectorColumnDataInConsole(mlContext, DefaultColumnNames.Features, trainingDataView, dataProcessPipeline, 1);

            // STEP 3: Set the training algorithm, then create and config the modelBuilder          
            var trainer = mlContext.MulticlassClassification.Trainers.NaiveBayes(labelColumn: nameof(MultiData.Label), featureColumn: DefaultColumnNames.Features);
            var trainingPipeline = dataProcessPipeline.Append(trainer);

            // STEP 4: Train the model fitting to the DataSet
            Console.WriteLine("=============== Training the model ===============");
            ITransformer trainedModel = trainingPipeline.Fit(trainingDataView);

Remark:
Even I change the type of the MultiData.Label to UInt32 will not be working as well.
With Error, "System.ArgumentOutOfRangeException: 'Schema mismatch for label column '': expected Key, got U4"

@Ivanidzo4ka
Copy link
Contributor

related to ##2628

@darren-zdc
Copy link
Author

Thanks for your reply!!
I solve it by adding
.Append(mlContext.Transforms.Conversion.MapValueToKey(outputColumnName: DefaultColumnNames.Label, inputColumnName: nameof(MultiData.Label)));

Maybe should add this line in all the MultiClass Classification samples, since all the samples are using SDCA, and SDCA will actually auto doing the keyMapping. That will be excellent for all the new learners~

@ghost ghost locked as resolved and limited conversation to collaborators Mar 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants