You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello! I am new to ML.Net, I have decided to to try using it in building a dispatcher. Basically I want it to be able to classify text in one of multiple categories. Due to the high volume of data, I want that when a prediction is confirmed by the users wrong to add it to its database(or re-train the model)
I have used AutoML to generate a base model. The algorithm with the best results chose by the AutoML for multiclassification is AveragedPerceptron. I have checked this page in order to make sure that is re-trainable.
I am able to get the first model, but struggling to re-train it.
First time I have created the model (simulate all the steps generated by AutoML)
// First Phase: Create the model
var mlContext = new MLContext(seed: 1);
// BuildTrainingPipeline
// Load Data
var data = mlContext.Data.LoadFromTextFile<ModelInput>(
path: TRAIN_DATA_FILEPATH,
hasHeader: false,
separatorChar: '\t',
allowQuoting: true,
allowSparse: false);
// Data process configuration with pipeline data transformations
var dataProcessPipeline = mlContext.Transforms.Conversion.MapValueToKey("col0", "col0")
.Append(mlContext.Transforms.Text.FeaturizeText("col1_tf", "col1"))
.Append(mlContext.Transforms.CopyColumns("Features", "col1_tf"))
.Append(mlContext.Transforms.NormalizeMinMax("Features", "Features"))
.AppendCacheCheckpoint(mlContext);
// Set the training algorithm
var trainer = mlContext.MulticlassClassification.Trainers.OneVersusAll(mlContext.BinaryClassification.Trainers
.AveragedPerceptron(labelColumnName: "col0", numberOfIterations: 10, featureColumnName: "Features"), labelColumnName: "col0")
.Append(mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabel", "PredictedLabel"));
IEstimator<ITransformer> trainingPipeline = dataProcessPipeline.Append(trainer);
// Train and save Model
// Create model here
ITransformer firstModel = trainingPipeline.Fit(data);
// Save the model
mlContext.Model.Save(firstModel, data.Schema, MODEL_FILEPATH);
Then I presume I have new data to train the model with
/// Second Phase - Re-training the model
// New Data
ModelInput[] ticketData = new ModelInput[]
{
new ModelInput
{
Col0 = "Category 3",
Col1 = "Text to classify 1"
},
new ModelInput
{
Col0 = "Category 2",
Col1 = "Text to classify 2"
},
new ModelInput
{
Col0 = "Category 3",
Col1 = "Text to classify 3"
},
new ModelInput
{
Col0 = "Category 2",
Col1 = "Text to classify 4"
},
new ModelInput
{
Col0 = "Category 1",
Col1 = "Text to classify 5"
},
};
// Create MLContext
MLContext mlContext = new MLContext();
// Define DataViewSchema trained model
DataViewSchema modelSchema;
// Load trained model
var trainedModel = mlContext.Model.Load(MODEL_FILEPATH, out modelSchema);
//Load New Data
IDataView newData = mlContext.Data.LoadFromEnumerable<ModelInput>(ticketData);
// And here I get stuck. Because I don't know how to retrain the model with new data.
Issue
I have tried to follow the guidance from this topics: here, here or changes suggested here but with no result.
My issues are due to multiclassification I think, because the trainer is of type EstimatorChain and my model is of type TransformerChain.
My trainer.Fit doesn't take 2 arguments.
The text was updated successfully, but these errors were encountered:
Context
Hello! I am new to ML.Net, I have decided to to try using it in building a dispatcher. Basically I want it to be able to classify text in one of multiple categories. Due to the high volume of data, I want that when a prediction is confirmed by the users wrong to add it to its database(or re-train the model)
I have used AutoML to generate a base model. The algorithm with the best results chose by the AutoML for multiclassification is AveragedPerceptron. I have checked this page in order to make sure that is re-trainable.
I am able to get the first model, but struggling to re-train it.
Issue
I have tried to follow the guidance from this topics: here, here or changes suggested here but with no result.
My issues are due to multiclassification I think, because the
trainer
is of type EstimatorChain and my model is of type TransformerChain.My
trainer.Fit
doesn't take 2 arguments.The text was updated successfully, but these errors were encountered: