Skip to content

Not able to re-train model [Multiclassification(AveragedPerceptron)] #5961

Open
@StefanStroescu

Description

@StefanStroescu

Context

Hello! I am new to ML.Net, I have decided to to try using it in building a dispatcher. Basically I want it to be able to classify text in one of multiple categories. Due to the high volume of data, I want that when a prediction is confirmed by the users wrong to add it to its database(or re-train the model)

I have used AutoML to generate a base model. The algorithm with the best results chose by the AutoML for multiclassification is AveragedPerceptron. I have checked this page in order to make sure that is re-trainable.

I am able to get the first model, but struggling to re-train it.

  • First time I have created the model (simulate all the steps generated by AutoML)
// First Phase: Create the model
           
            var mlContext = new MLContext(seed: 1);


            // BuildTrainingPipeline

            // Load Data
            var data = mlContext.Data.LoadFromTextFile<ModelInput>(
                                            path: TRAIN_DATA_FILEPATH,
                                            hasHeader: false,
                                            separatorChar: '\t',
                                            allowQuoting: true,
                                            allowSparse: false);


            // Data process configuration with pipeline data transformations
            var dataProcessPipeline = mlContext.Transforms.Conversion.MapValueToKey("col0", "col0")
                                      .Append(mlContext.Transforms.Text.FeaturizeText("col1_tf", "col1"))
                                      .Append(mlContext.Transforms.CopyColumns("Features", "col1_tf"))
                                      .Append(mlContext.Transforms.NormalizeMinMax("Features", "Features"))
                                      .AppendCacheCheckpoint(mlContext);



            // Set the training algorithm 
            var trainer = mlContext.MulticlassClassification.Trainers.OneVersusAll(mlContext.BinaryClassification.Trainers
                                     .AveragedPerceptron(labelColumnName: "col0", numberOfIterations: 10, featureColumnName: "Features"), labelColumnName: "col0")
                                      .Append(mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabel", "PredictedLabel"));


            IEstimator<ITransformer> trainingPipeline = dataProcessPipeline.Append(trainer);



            // Train and save Model


            // Create model here
            ITransformer firstModel = trainingPipeline.Fit(data);

            // Save the model
            mlContext.Model.Save(firstModel, data.Schema, MODEL_FILEPATH);
  • Then I presume I have new data to train the model with
/// Second Phase - Re-training the model

            // New Data
            ModelInput[] ticketData = new ModelInput[]
            {

                  new ModelInput
                  {
                      Col0 = "Category 3",
                      Col1 = "Text to classify 1"
                  },

                  new ModelInput
                  {
                      Col0 = "Category 2",
                      Col1 = "Text to classify 2"
                  },

                  new ModelInput
                  {
                      Col0 = "Category 3",
                      Col1 = "Text to classify 3"
                  },

                  new ModelInput
                  {
                      Col0 = "Category 2",
                      Col1 = "Text to classify 4"
                  },

                  new ModelInput
                  {
                      Col0 = "Category 1",
                      Col1 = "Text to classify 5"
                  },

            };



            // Create MLContext
            MLContext mlContext = new MLContext();

            // Define DataViewSchema  trained model
            DataViewSchema modelSchema;

            // Load trained model
            var trainedModel = mlContext.Model.Load(MODEL_FILEPATH, out modelSchema);

            //Load New Data
            IDataView newData = mlContext.Data.LoadFromEnumerable<ModelInput>(ticketData);
     


           // And here I get stuck. Because I don't know how to retrain the model with new data. 

  • Issue

    I have tried to follow the guidance from this topics: here, here or changes suggested here but with no result.

    My issues are due to multiclassification I think, because the trainer is of type EstimatorChain and my model is of type TransformerChain.
    My trainer.Fit doesn't take 2 arguments.

Metadata

Metadata

Assignees

No one assigned

    Labels

    AutoML.NETAutomating various steps of the machine learning process

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions