Skip to content

Not able to re-train model [Multiclassification(AveragedPerceptron)] #5961

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
StefanStroescu opened this issue Oct 7, 2021 · 2 comments
Open
Labels
AutoML.NET Automating various steps of the machine learning process

Comments

@StefanStroescu
Copy link

StefanStroescu commented Oct 7, 2021

Context

Hello! I am new to ML.Net, I have decided to to try using it in building a dispatcher. Basically I want it to be able to classify text in one of multiple categories. Due to the high volume of data, I want that when a prediction is confirmed by the users wrong to add it to its database(or re-train the model)

I have used AutoML to generate a base model. The algorithm with the best results chose by the AutoML for multiclassification is AveragedPerceptron. I have checked this page in order to make sure that is re-trainable.

I am able to get the first model, but struggling to re-train it.

  • First time I have created the model (simulate all the steps generated by AutoML)
// First Phase: Create the model
           
            var mlContext = new MLContext(seed: 1);


            // BuildTrainingPipeline

            // Load Data
            var data = mlContext.Data.LoadFromTextFile<ModelInput>(
                                            path: TRAIN_DATA_FILEPATH,
                                            hasHeader: false,
                                            separatorChar: '\t',
                                            allowQuoting: true,
                                            allowSparse: false);


            // Data process configuration with pipeline data transformations
            var dataProcessPipeline = mlContext.Transforms.Conversion.MapValueToKey("col0", "col0")
                                      .Append(mlContext.Transforms.Text.FeaturizeText("col1_tf", "col1"))
                                      .Append(mlContext.Transforms.CopyColumns("Features", "col1_tf"))
                                      .Append(mlContext.Transforms.NormalizeMinMax("Features", "Features"))
                                      .AppendCacheCheckpoint(mlContext);



            // Set the training algorithm 
            var trainer = mlContext.MulticlassClassification.Trainers.OneVersusAll(mlContext.BinaryClassification.Trainers
                                     .AveragedPerceptron(labelColumnName: "col0", numberOfIterations: 10, featureColumnName: "Features"), labelColumnName: "col0")
                                      .Append(mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabel", "PredictedLabel"));


            IEstimator<ITransformer> trainingPipeline = dataProcessPipeline.Append(trainer);



            // Train and save Model


            // Create model here
            ITransformer firstModel = trainingPipeline.Fit(data);

            // Save the model
            mlContext.Model.Save(firstModel, data.Schema, MODEL_FILEPATH);
  • Then I presume I have new data to train the model with
/// Second Phase - Re-training the model

            // New Data
            ModelInput[] ticketData = new ModelInput[]
            {

                  new ModelInput
                  {
                      Col0 = "Category 3",
                      Col1 = "Text to classify 1"
                  },

                  new ModelInput
                  {
                      Col0 = "Category 2",
                      Col1 = "Text to classify 2"
                  },

                  new ModelInput
                  {
                      Col0 = "Category 3",
                      Col1 = "Text to classify 3"
                  },

                  new ModelInput
                  {
                      Col0 = "Category 2",
                      Col1 = "Text to classify 4"
                  },

                  new ModelInput
                  {
                      Col0 = "Category 1",
                      Col1 = "Text to classify 5"
                  },

            };



            // Create MLContext
            MLContext mlContext = new MLContext();

            // Define DataViewSchema  trained model
            DataViewSchema modelSchema;

            // Load trained model
            var trainedModel = mlContext.Model.Load(MODEL_FILEPATH, out modelSchema);

            //Load New Data
            IDataView newData = mlContext.Data.LoadFromEnumerable<ModelInput>(ticketData);
     


           // And here I get stuck. Because I don't know how to retrain the model with new data. 

  • Issue

    I have tried to follow the guidance from this topics: here, here or changes suggested here but with no result.

    My issues are due to multiclassification I think, because the trainer is of type EstimatorChain and my model is of type TransformerChain.
    My trainer.Fit doesn't take 2 arguments.

@michaelgsharp michaelgsharp added the AutoML.NET Automating various steps of the machine learning process label Oct 7, 2021
@michaelgsharp
Copy link
Contributor

Are you able to upload the model/sample data you used so we can take a look?

@StefanStroescu
Copy link
Author

StefanStroescu commented Oct 11, 2021

MultiClassification_AveragePerceptron.zip

Sorry for the late replay. Please see the RetrainModel.cs

This is the data I used initially for training.
DummyData.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AutoML.NET Automating various steps of the machine learning process
Projects
None yet
Development

No branches or pull requests

2 participants