-
Notifications
You must be signed in to change notification settings - Fork 1.9k
StochasticDualCoordinateAscent not work For Multiclass after migrate to 0.10.0 #2486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hint My dataView Contain 250K rows |
@DevLob-zz Thanks for reporting this! I'll take a look at this! Marking as |
is there is info can i provide you with it , thank you for interest best regards |
Hi @DevLob-zz , I've run a multiclass classification sample, GitHubIssueClassification in 0.9 and 0.10, and they get very similar times, so I don't think the problem is related to the SDCA trainer or it's API. Plus we haven't done any work on SDCA between the releases. One thing that did change between 0.9 and 0.10 is the order of arguments to transforms. Do you do any featurization in |
mlContext has Log event.
If we can look on this messages, it can be helpful to diagnose at what state it get stuck at least. We had this change #2152 which can be reason why it worked in 0.9 but stopped in 0.10, but as @rogancarr mentioned it works for us on some datasets, so your case can be quite tricky. It's hard to fix problem without ability to reproduce it. |
Do you get any success if you setup NumThreads in SDCA to 1? best regards |
i replace the trainer line with and same issue this is my program .cs code i usesd using System.Collections.Generic;
using Microsoft.Data.DataView;
using Microsoft.ML.Core.Data;
using Microsoft.ML.Data;
using Microsoft.ML.Samples.Dynamic;
using Microsoft.ML.Trainers;
namespace Microsoft.ML.Samples
{
internal static class Program
{
static void Main(string[] args)
{
List<NormalTagsModelFeatures> noramalTagsTrainingData = new List<NormalTagsModelFeatures>();
noramalTagsTrainingData.Add(
new NormalTagsModelFeatures()
{
//Label = TagListDictionary.GetTagId(label),
Label = "Tag Test",
pageNo = -1,
fontSize = -1,
isBold = -1,
isItalic = -1,
isUnderLine = -1,
containsDot = -1,
containsQuestionMark = -1,
isAllCaps = -1,
fontColor = "Ss",
FontName = "Ss",
tagText = "dd dd",
firstWord = "dd",
trdBottom = -1,
trdLeft = -1,
trdRight = -1,
trdTop = -1,
verticalText = -1,
}
);
List<string> textFeatures = new List<string>() { "firstWord", "tagText", "FontName", "fontColor" };
string[] numericFeatures = new string[] { "pageNo", "fontSize", "isBold", "isItalic", "isUnderLine",
"containsDot", "containsQuestionMark","isAllCaps","trdBottom","trdLeft","trdRight","trdTop","verticalText" } ;
var mlContext = new MLContext(seed: 1);
IDataView trainingDataView =mlContext.Data.ReadFromEnumerable(noramalTagsTrainingData);
var textFeaturesProcessPipeline = mlContext.Transforms.Text.FeaturizeText(DefaultColumnNames.Features, textFeatures,new Transforms.Text.TextFeaturizingEstimator.Options());
var numericFeaturesProcessPipeline = mlContext.Transforms.Concatenate(DefaultColumnNames.Features, numericFeatures);
var dataProcessPipeline = numericFeaturesProcessPipeline.Append(textFeaturesProcessPipeline).AppendCacheCheckpoint(mlContext);
var trainner = mlContext.MulticlassClassification.Trainers.StochasticDualCoordinateAscent(new SdcaMultiClassTrainer.Options
{
FeatureColumn = DefaultColumnNames.Features,
LabelColumn = DefaultColumnNames.Label,
NumThreads = 1
});
trainner.AppendCacheCheckpoint(mlContext);
var trainingPipeline = mlContext.Transforms.Conversion.MapValueToKey(DefaultColumnNames.Label)
.Append(dataProcessPipeline)
.Append(trainner)
.Append(mlContext.Transforms.Conversion.MapKeyToValue(DefaultColumnNames.PredictedLabel));
ITransformer trainedModel = trainingPipeline.Fit(trainingDataView);
//TakeRows.Example();
}
}
public class NormalTagsModelFeatures
{
//[Column(ordinal: "0", name: "Label")] public string Label;
[LoadColumn(0)]
public string Label;
[LoadColumn(1)]
public float fontSize;
[LoadColumn(2)]
public float isBold;
[LoadColumn(3)]
public float isItalic;
[LoadColumn(4)]
public float isUnderLine;
[LoadColumn(5)]
public float containsDot;
[LoadColumn(6)]
public float containsQuestionMark;
[LoadColumn(7)]
public string fontColor;
[LoadColumn(8)]
public float isAllCaps;
[LoadColumn(9)]
public string tagText;
[LoadColumn(10)]
public string firstWord;
[LoadColumn(11)]
public string FontName;
[LoadColumn(12)]
public float verticalText;
[LoadColumn(13)]
public float trdLeft;
[LoadColumn(14)]
public float trdRight;
[LoadColumn(15)]
public float trdTop;
[LoadColumn(16)]
public float trdBottom;
[LoadColumn(17)]
public float pageNo;
}
public class NormalTagsPrediction
{
[Column(ordinal: "0", name: "PredictedLabel")]
public string Label;
[ColumnName("Score")]
public float[] Score { get; set; }
}
} |
var trainner = mlContext.MulticlassClassification.Trainers.StochasticDualCoordinateAscent(new SdcaMultiClassTrainer.Options so this solve my issue but which MaxIterations will be recommenced based on your |
Number of iterations is data-dependent. You can try 10, 20, 40, 80, 160, ..., 640 to find the value leading to the best test accuracy. In addition, a small regularization coefficient may lead to overfitting so you need to terminate the training very early (e.g., just 1, 2, 4, 8, 16 iterations are enough). |
It doesn't justify our auto selecting code to put 1 500 000 iterations over dataset. |
@rogancarr @Ivanidzo4ka @TomFinley - the above deadlock looks to be the same as #1095. |
Hello @DevLob-zz thank you for sharing all this information.
The above questions can be very helpful in identifying what's wrong. Also, I am trying to reproduce this issue locally to fix it, and if you are willing to share the dataset I can take a closer look, no problem of course otherwise. |
here is a program.cs that i used to generate more than 400K and try to generate the model Hint : it work fine with small data set |
Closing this issue assuming it's fixed due to lack of activity and no new reported issues on the matter. Thanks for the discussion / feedback. |
System information
Issue
trainingPipeline.Fit(trainingDataView); for Multiclass take a while
it should work fine
Source code / logs
`var mlContext = new MLContext(seed: 1);
#region "STEP 1: Common data loading configuration"
#endregion
#region "STEP 2: Common data process configuration with pipeline data transformations"
// STEP 2: Common data process configuration with pipeline data transformations
var textFeaturesProcessPipeline = mlContext.Transforms.Text.FeaturizeText(DefaultColumnNames.Features, textFeatures);
var numericFeaturesProcessPipeline = mlContext.Transforms.Concatenate(DefaultColumnNames.Features, numericFeatures);
var dataProcessPipeline = numericFeaturesProcessPipeline.Append(textFeaturesProcessPipeline).AppendCacheCheckpoint(mlContext);
#endregion
#region "STEP 3: Set the training algorithm, then create and configure the modelBuilder"
ITransformer trainedModel = null;
//"StochasticDualCoordinateAscent"
var trainingPipeline = mlContext.Transforms.Conversion.MapValueToKey(DefaultColumnNames.Label)
.Append(dataProcessPipeline)
.Append(mlContext.MulticlassClassification.Trainers.StochasticDualCoordinateAscent(labelColumn: DefaultColumnNames.Label, featureColumn: DefaultColumnNames.Features))
.Append(mlContext.Transforms.Conversion.MapKeyToValue(DefaultColumnNames.PredictedLabel));
#region STEP 4: Train the model fitting to the DataSet
#endregion`
you can see some screen for values and when change from
"StochasticDualCoordinateAscent" to "Naive Bayes" Working fine
what wrong on my code



also those my Data Structure Classes
`[Serializable]
public class NormalTagsModelFeatures
{
//[Column(ordinal: "0", name: "Label")] public string Label;
[LoadColumn(0)]
public string Label;
[LoadColumn(1)]
public float fontSize;
[LoadColumn(2)]
public float isBold;
[LoadColumn(3)]
public float isItalic;
[LoadColumn(4)]
public float isUnderLine;
[LoadColumn(5)]
public float containsDot;
[LoadColumn(6)]
public float containsQuestionMark;
[LoadColumn(7)]
public string fontColor;
[LoadColumn(8)]
public float isAllCaps;
[LoadColumn(9)]
public string tagText;
[LoadColumn(10)]
public string firstWord;
[LoadColumn(11)]
public string FontName;
[LoadColumn(12)]
public float verticalText;
[LoadColumn(13)]
public float trdLeft;
[LoadColumn(14)]
public float trdRight;
[LoadColumn(15)]
public float trdTop;
[LoadColumn(16)]
public float trdBottom;
[LoadColumn(17)]
public float pageNo;
Please paste or attach the code or logs or traces that would be helpful to diagnose the issue you are reporting.
The text was updated successfully, but these errors were encountered: