Skip to content

StochasticDualCoordinateAscent not work For Multiclass after migrate to 0.10.0 #2486

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
DevLob-zz opened this issue Feb 9, 2019 · 20 comments
Labels
bug Something isn't working classification Bugs related classification tasks image Bugs related image datatype tasks ModelBuilder Bugs related model builder P1 Priority of the issue for triage purpose: Needs to be fixed soon. question Further information is requested

Comments

@DevLob-zz
Copy link

DevLob-zz commented Feb 9, 2019

System information

  • OS version/distro: W10
  • .NET Version (eg., dotnet --info): 4.6.1

Issue

  • What did you do? Migrated my code from 0.9.0 to 0.10.0
  • What happened? StochasticDualCoordinateAscent Algorithm was working fine for multi class and binary trainer before move to 0.10 after updating its working only for binary and freeze on
    trainingPipeline.Fit(trainingDataView); for Multiclass take a while
  • What did you expect?
    it should work fine

Source code / logs

`var mlContext = new MLContext(seed: 1);

#region "STEP 1: Common data loading configuration"

IDataView trainingDataView = GetNormalDataSet(mlContext, allFeatures, mLFeatures);

if (trainingDataView.GetRowCount() == 0)
{

    return;
}

textFeatures = GetTextFeatures(normalFeatures);

numericFeatures = GetNumericFeatures(normalFeatures).ToArray();

#endregion

#region "STEP 2: Common data process configuration with pipeline data transformations"

// STEP 2: Common data process configuration with pipeline data transformations

var textFeaturesProcessPipeline = mlContext.Transforms.Text.FeaturizeText(DefaultColumnNames.Features, textFeatures);

var numericFeaturesProcessPipeline = mlContext.Transforms.Concatenate(DefaultColumnNames.Features, numericFeatures);

var dataProcessPipeline = numericFeaturesProcessPipeline.Append(textFeaturesProcessPipeline).AppendCacheCheckpoint(mlContext);

#endregion

#region "STEP 3: Set the training algorithm, then create and configure the modelBuilder"

ITransformer trainedModel = null;

//"StochasticDualCoordinateAscent"

var trainingPipeline = mlContext.Transforms.Conversion.MapValueToKey(DefaultColumnNames.Label)
.Append(dataProcessPipeline)
.Append(mlContext.MulticlassClassification.Trainers.StochasticDualCoordinateAscent(labelColumn: DefaultColumnNames.Label, featureColumn: DefaultColumnNames.Features))
.Append(mlContext.Transforms.Conversion.MapKeyToValue(DefaultColumnNames.PredictedLabel));

#region STEP 4: Train the model fitting to the DataSet

//Take a while and no responce when call fit method

trainedModel = trainingPipeline.Fit(trainingDataView);

#endregion`

you can see some screen for values and when change from
"StochasticDualCoordinateAscent" to "Naive Bayes" Working fine

what wrong on my code
image
image
image

also those my Data Structure Classes
`[Serializable]
public class NormalTagsModelFeatures
{
//[Column(ordinal: "0", name: "Label")] public string Label;
[LoadColumn(0)]
public string Label;
[LoadColumn(1)]
public float fontSize;
[LoadColumn(2)]
public float isBold;
[LoadColumn(3)]
public float isItalic;
[LoadColumn(4)]
public float isUnderLine;
[LoadColumn(5)]
public float containsDot;
[LoadColumn(6)]
public float containsQuestionMark;
[LoadColumn(7)]
public string fontColor;
[LoadColumn(8)]
public float isAllCaps;
[LoadColumn(9)]
public string tagText;
[LoadColumn(10)]
public string firstWord;
[LoadColumn(11)]
public string FontName;
[LoadColumn(12)]
public float verticalText;
[LoadColumn(13)]
public float trdLeft;
[LoadColumn(14)]
public float trdRight;
[LoadColumn(15)]
public float trdTop;
[LoadColumn(16)]
public float trdBottom;
[LoadColumn(17)]
public float pageNo;

}

public class NormalTagsPrediction
{
    [ColumnName("PredictedLabel")]
    public string PredictedLabel;

    [ColumnName("Score")]
    public float[] Score { get; set; }

}`

Please paste or attach the code or logs or traces that would be helpful to diagnose the issue you are reporting.

@DevLob-zz
Copy link
Author

DevLob-zz commented Feb 9, 2019

Hint My dataView Contain 250K rows
Also worked fine within Naive Bayes and LogisticRegression

@rogancarr rogancarr added the bug Something isn't working label Feb 9, 2019
@rogancarr
Copy link
Contributor

rogancarr commented Feb 9, 2019

@DevLob-zz Thanks for reporting this! I'll take a look at this!

Marking as need info until I repro it.

@rogancarr rogancarr self-assigned this Feb 9, 2019
@rogancarr rogancarr added need info This issue needs more info before triage and removed bug Something isn't working labels Feb 9, 2019
@DevLob-zz
Copy link
Author

is there is info can i provide you with it , thank you for interest

best regards

@rogancarr
Copy link
Contributor

Hi @DevLob-zz ,

I've run a multiclass classification sample, GitHubIssueClassification in 0.9 and 0.10, and they get very similar times, so I don't think the problem is related to the SDCA trainer or it's API. Plus we haven't done any work on SDCA between the releases.

One thing that did change between 0.9 and 0.10 is the order of arguments to transforms. Do you do any featurization in GetNormalDataSet(mlContext, allFeatures, mLFeatures)? Is it possible that your column names have changed between the releases, and the learners are building features on different data?

@rogancarr rogancarr removed the need info This issue needs more info before triage label Feb 10, 2019
@DevLob-zz
Copy link
Author

image
image

No the code never changed between 0.9 and 0.10 and still work fine for Naive Bayes and Logistic Regression for Multi class
SDCA Still work fine for Binary Classification but seem to deep into infinite loop wile Multi class

@DevLob-zz
Copy link
Author

DevLob-zz commented Feb 10, 2019

image
the code stop and no response in Fit Method in this line however if code going to other case same code just change algorithm working well
image

@DevLob-zz
Copy link
Author

DevLob-zz commented Feb 10, 2019

image

any help would be appreciated

@Ivanidzo4ka
Copy link
Contributor

mlContext has Log event.

 mlContext.Log += MlContext_Log;
private static void MlContext_Log(object sender, LoggingEventArgs e)
{
    Console.WriteLine(e.Message);
}

If we can look on this messages, it can be helpful to diagnose at what state it get stuck at least.
Do you get any success if you setup NumThreads in SDCA to 1?

We had this change #2152 which can be reason why it worked in 0.9 but stopped in 0.10, but as @rogancarr mentioned it works for us on some datasets, so your case can be quite tricky. It's hard to fix problem without ability to reproduce it.

@DevLob-zz
Copy link
Author

DevLob-zz commented Feb 11, 2019

mlContext has Log event.

 mlContext.Log += MlContext_Log;
private static void MlContext_Log(object sender, LoggingEventArgs e)
{
    Console.WriteLine(e.Message);
}

If we can look on this messages, it can be helpful to diagnose at what state it get stuck at least.
Do you get any success if you setup NumThreads in SDCA to 1?

We had this change #2152 which can be reason why it worked in 0.9 but stopped in 0.10, but as @rogancarr mentioned it works for us on some datasets, so your case can be quite tricky. It's hard to fix problem without ability to reproduce it.
this what the logger stop at
mlerror

Do you get any success if you setup NumThreads in SDCA to 1?
how can i set this value do you mean this
var mlContext = new MLContext(seed: 1,conc:1);
if yes i did and same
It's hard to fix problem without ability to reproduce it.
yes sure i will try to debug through The ML Source
and keep you posted

best regards

@DevLob-zz
Copy link
Author

DevLob-zz commented Feb 11, 2019

after integrate my code with the source i can see that the code stick here
image

and he stick here as he get numThreads = 2 so if i can set it to 1 i think he will never stick again
NumofCycle = 750000

@DevLob-zz
Copy link
Author

DevLob-zz commented Feb 11, 2019

i replace the trainer line with
var trainner = mlContext.MulticlassClassification.Trainers.StochasticDualCoordinateAscent(new SdcaMultiClassTrainer.Options
{ FeatureColumn = DefaultColumnNames.Features ,
LabelColumn = DefaultColumnNames.Label,
NumThreads = 1
});

and same issue

this is my program .cs code i usesd

using System.Collections.Generic;
using Microsoft.Data.DataView;
using Microsoft.ML.Core.Data;
using Microsoft.ML.Data;
using Microsoft.ML.Samples.Dynamic;
using Microsoft.ML.Trainers;

namespace Microsoft.ML.Samples
{
    internal static class Program
    {
        static void Main(string[] args)
        {
            List<NormalTagsModelFeatures> noramalTagsTrainingData = new List<NormalTagsModelFeatures>();
            noramalTagsTrainingData.Add(
                        new NormalTagsModelFeatures()
                        {
                            //Label = TagListDictionary.GetTagId(label),
                            Label = "Tag Test",
                            pageNo =  -1,
                            fontSize =  -1,
                            isBold =  -1,
                            isItalic =  -1,
                            isUnderLine =  -1,
                            containsDot =  -1,
                            containsQuestionMark =  -1,
                            isAllCaps =  -1,
                            fontColor =  "Ss",
                            FontName =  "Ss",
                            tagText =  "dd  dd",
                            firstWord = "dd",
                            trdBottom =  -1,
                            trdLeft =  -1,
                            trdRight = -1,
                            trdTop = -1,
                            verticalText = -1,
                        }
                    );
            List<string> textFeatures = new List<string>() { "firstWord", "tagText", "FontName", "fontColor" };
            string[] numericFeatures = new string[] { "pageNo", "fontSize", "isBold", "isItalic", "isUnderLine",
                "containsDot", "containsQuestionMark","isAllCaps","trdBottom","trdLeft","trdRight","trdTop","verticalText" } ;
            var mlContext = new MLContext(seed: 1);
            IDataView trainingDataView =mlContext.Data.ReadFromEnumerable(noramalTagsTrainingData);
            var textFeaturesProcessPipeline = mlContext.Transforms.Text.FeaturizeText(DefaultColumnNames.Features, textFeatures,new Transforms.Text.TextFeaturizingEstimator.Options());
            var numericFeaturesProcessPipeline = mlContext.Transforms.Concatenate(DefaultColumnNames.Features, numericFeatures);
            var dataProcessPipeline = numericFeaturesProcessPipeline.Append(textFeaturesProcessPipeline).AppendCacheCheckpoint(mlContext);
            var trainner = mlContext.MulticlassClassification.Trainers.StochasticDualCoordinateAscent(new SdcaMultiClassTrainer.Options
            {
                FeatureColumn = DefaultColumnNames.Features,
                LabelColumn = DefaultColumnNames.Label,
                NumThreads = 1
            });
            trainner.AppendCacheCheckpoint(mlContext);
            var trainingPipeline = mlContext.Transforms.Conversion.MapValueToKey(DefaultColumnNames.Label)
           .Append(dataProcessPipeline)
           .Append(trainner)
           .Append(mlContext.Transforms.Conversion.MapKeyToValue(DefaultColumnNames.PredictedLabel));
            ITransformer trainedModel = trainingPipeline.Fit(trainingDataView);
            //TakeRows.Example();
        }
    }

    public class NormalTagsModelFeatures
    {
        //[Column(ordinal: "0", name: "Label")] public string Label;
        [LoadColumn(0)]
        public string Label;
        [LoadColumn(1)]
        public float fontSize;
        [LoadColumn(2)]
        public float isBold;
        [LoadColumn(3)]
        public float isItalic;
        [LoadColumn(4)]
        public float isUnderLine;
        [LoadColumn(5)]
        public float containsDot;
        [LoadColumn(6)]
        public float containsQuestionMark;
        [LoadColumn(7)]
        public string fontColor;
        [LoadColumn(8)]
        public float isAllCaps;
        [LoadColumn(9)]
        public string tagText;
        [LoadColumn(10)]
        public string firstWord;
        [LoadColumn(11)]
        public string FontName;
        [LoadColumn(12)]
        public float verticalText;
        [LoadColumn(13)]
        public float trdLeft;
        [LoadColumn(14)]
        public float trdRight;
        [LoadColumn(15)]
        public float trdTop;
        [LoadColumn(16)]
        public float trdBottom;
        [LoadColumn(17)]
        public float pageNo;

    }

    public class NormalTagsPrediction
    {
        [Column(ordinal: "0", name: "PredictedLabel")]
        public string Label;


        [ColumnName("Score")]
        public float[] Score { get; set; }

    }
}

@DevLob-zz
Copy link
Author

DevLob-zz commented Feb 11, 2019

maxIterations = 1,500,000

what is the best value for maxIterations can i set and not affect the accuracy

image

@DevLob-zz
Copy link
Author

var trainner = mlContext.MulticlassClassification.Trainers.StochasticDualCoordinateAscent(new SdcaMultiClassTrainer.Options
{
FeatureColumn = DefaultColumnNames.Features,
LabelColumn = DefaultColumnNames.Label,
NumThreads = 1,
MaxIterations= 1000
});

so this solve my issue but

which MaxIterations will be recommenced based on your

@shauheen shauheen added the question Further information is requested label Feb 11, 2019
@wschin
Copy link
Member

wschin commented Feb 14, 2019

Number of iterations is data-dependent. You can try 10, 20, 40, 80, 160, ..., 640 to find the value leading to the best test accuracy. In addition, a small regularization coefficient may lead to overfitting so you need to terminate the training very early (e.g., just 1, 2, 4, 8, 16 iterations are enough).

@Ivanidzo4ka
Copy link
Contributor

It doesn't justify our auto selecting code to put 1 500 000 iterations over dataset.

@rogancarr rogancarr removed their assignment Feb 14, 2019
@DevLob-zz DevLob-zz reopened this Feb 15, 2019
@DevLob-zz DevLob-zz reopened this Feb 15, 2019
@DevLob-zz
Copy link
Author

image

the code seem to be wait while here and never out from while when testing on 500K rows

i wait on this iteration more than one hour and never move to next

is we can add timeout or something that guarantee that he will out from while Loop

@eerhardt
Copy link
Member

@rogancarr @Ivanidzo4ka @TomFinley - the above deadlock looks to be the same as #1095.

@artidoro
Copy link
Contributor

Hello @DevLob-zz thank you for sharing all this information.
I have a few questions for you:

  1. Does it just take a long time but eventually finish training, or is it stuck training?
  2. Is the issue showing only on v0.10 or did you have the same issue on v0.9? Are you building off of master? Or using the nuget for v0.10?
  3. Is this reproducible 100% of the time?

The above questions can be very helpful in identifying what's wrong.

Also, I am trying to reproduce this issue locally to fix it, and if you are willing to share the dataset I can take a closer look, no problem of course otherwise.

@DevLob-zz
Copy link
Author

Hello @DevLob-zz thank you for sharing all this information.
I have a few questions for you:

  1. Does it just take a long time but eventually finish training, or is it stuck training?
    it never finish when it stuck
  2. Is the issue showing only on v0.10 or did you have the same issue on v0.9? Are you building off of master? Or using the nuget for v0.10?
    for 0.9 everything is OK but for 0.10 i issued this
    i try by nuget and the master this what show me this dead lock
  3. Is this reproducible 100% of the time?
    No

The above questions can be very helpful in identifying what's wrong.

Also, I am trying to reproduce this issue locally to fix it, and if you are willing to share the dataset I can take a closer look, no problem of course otherwise.

here is a program.cs that i used to generate more than 400K and try to generate the model

Hint : it work fine with small data set

Program.zip

@harishsk harishsk added bug Something isn't working P1 Priority of the issue for triage purpose: Needs to be fixed soon. labels Jan 10, 2020
@harishsk harishsk added image Bugs related image datatype tasks classification Bugs related classification tasks ModelBuilder Bugs related model builder labels Apr 29, 2020
@luisquintanilla
Copy link
Contributor

Closing this issue assuming it's fixed due to lack of activity and no new reported issues on the matter. Thanks for the discussion / feedback.

@ghost ghost locked as resolved and limited conversation to collaborators Aug 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working classification Bugs related classification tasks image Bugs related image datatype tasks ModelBuilder Bugs related model builder P1 Priority of the issue for triage purpose: Needs to be fixed soon. question Further information is requested
Projects
None yet
Development

No branches or pull requests

10 participants