Skip to content

How to get around the exception caused because of #5506 fix? #5612

Closed
@aforoughi1

Description

@aforoughi1

Using the AutoML version 0.17.2 and 0.17.4, I get a few exceptions during SdcaRegression (simillar to #4363)
However, a new behaviour using 0.17.4, I get the AggregateException (changes because of #5445).

Exception during AutoML iteration: System.InvalidOperationException: The weights/bias contain invalid values (NaN or Infinite). Potential causes: high learning rates, no normalization, high initial weights, etc.
at Microsoft.ML.Trainers.OnlineLinearTrainer2.TrainModelCore(TrainContext context) at Microsoft.ML.Trainers.TrainerEstimatorBase2.TrainTransformer(IDataView trainSet, IDataView validationSet, IPredictor initPredictor)
at Microsoft.ML.Data.EstimatorChain1.Fit(IDataView input) at Microsoft.ML.AutoML.RunnerUtil.TrainAndScorePipeline[TMetrics](MLContext context, SuggestedPipeline pipeline, IDataView trainData, IDataView validData, String groupId, String labelColumn, IMetricsAgent1
metricsAgent, ITransformer preprocessorTransform, FileInfo modelFileInfo, DataViewSchema modelInputSchema, IChannel logger)

System.AggregateException
HResult=0x80131500
Message=One or more errors occurred. (Operation was canceled.) (Operation was canceled.) (Operation was canceled.) (Operation was canceled.)
Source=System.Private.CoreLib
StackTrace:
at System.ThrowHelper.ThrowAggregateException(List1 exceptions) at System.Threading.Tasks.Task.WaitAllCore(Task[] tasks, Int32 millisecondsTimeout, CancellationToken cancellationToken) at System.Threading.Tasks.Parallel.Invoke(ParallelOptions parallelOptions, Action[] actions) at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw(Exception source) at System.Threading.Tasks.Parallel.ThrowSingleCancellationExceptionOrOtherException(ICollection exceptions, CancellationToken cancelToken, Exception otherException) at System.Threading.Tasks.Parallel.Invoke(ParallelOptions parallelOptions, Action[] actions) at Microsoft.ML.Trainers.FastTree.ThreadTaskManager.ThreadTask.RunTask() at Microsoft.ML.Trainers.FastTree.LeastSquaresRegressionTreeLearner.FindBestSplitOfRoot(Double[] targets) at Microsoft.ML.Trainers.FastTree.LeastSquaresRegressionTreeLearner.FitTargets(IChannel ch, Boolean[] activeFeatures, Double[] targets) at Microsoft.ML.Trainers.FastTree.RandomForestLeastSquaresTreeLearner.FitTargets(IChannel ch, Boolean[] activeFeatures, Double[] weightedtargets, Double[] targets, Double[] weights) at Microsoft.ML.Trainers.FastTree.RandomForestOptimizer.TrainingIteration(IChannel ch, Boolean[] activeFeatures) at Microsoft.ML.Trainers.FastTree.FastTreeTrainerBase3.Train(IChannel ch)
at Microsoft.ML.Trainers.FastTree.FastTreeTrainerBase3.TrainCore(IChannel ch) at Microsoft.ML.Trainers.FastTree.FastForestRegressionTrainer.TrainModelCore(TrainContext context) at Microsoft.ML.Trainers.TrainerEstimatorBase2.TrainTransformer(IDataView trainSet, IDataView validationSet, IPredictor initPredictor)
at Microsoft.ML.AutoML.SmacSweeper.FitModel(IEnumerable1 previousRuns) at Microsoft.ML.AutoML.SmacSweeper.ProposeSweeps(Int32 maxSweeps, IEnumerable1 previousRuns)
at Microsoft.ML.AutoML.PipelineSuggester.SampleHyperparameters(MLContext context, SuggestedTrainer trainer, IEnumerable1 history, Boolean isMaximizingMetric, IChannel logger) at Microsoft.ML.AutoML.PipelineSuggester.GetNextInferredPipeline(MLContext context, IEnumerable1 history, DatasetColumnInfo[] columns, TaskKind task, Boolean isMaximizingMetric, CacheBeforeTrainer cacheBeforeTrainer, IChannel logger, IEnumerable1 trainerAllowList) at Microsoft.ML.AutoML.Experiment2.Execute()
at Microsoft.ML.AutoML.ExperimentBase2.Execute(ColumnInformation columnInfo, DatasetColumnInfo[] columns, IEstimator1 preFeaturizer, IProgress1 progressHandler, IRunner1 runner)
at Microsoft.ML.AutoML.ExperimentBase2.ExecuteTrainValidate(IDataView trainData, ColumnInformation columnInfo, IDataView validationData, IEstimator1 preFeaturizer, IProgress1 progressHandler) at Microsoft.ML.AutoML.ExperimentBase2.Execute(IDataView trainData, IDataView validationData, ColumnInformation columnInformation, IEstimator1 preFeaturizer, IProgress1 progressHandler)
at Microsoft.ML.AutoML.ExperimentBase2.Execute(IDataView trainData, IDataView validationData, String labelColumnName, IEstimator1 preFeaturizer, IProgress1 progressHandler) at AutoMLApp.Experiment2Template.Train() in C:\Users\abbas\OneDrive\Documents\WorkingProgress\MLStcokMarketPrediction\AutoMLApp\Class1.cs:line 446 at AutoMLApp.MlModelTemplate.BuildModel() in C:\Users\abbas\OneDrive\Documents\WorkingProgress\MLStcokMarketPrediction\AutoMLApp\Class1.cs:line 363 at AutoMLApp.MlExperimentsFactory.Experiment2Tasks(Kind kind, OutputLabels op, BinaryClassificationMetric optimizingMetric, List1 trainers) in C:\Users\abbas\OneDrive\Documents\WorkingProgress\MLStcokMarketPrediction\AutoMLApp\Class1.cs:line 177
at AutoMLApp.MlExperimentsFactory.<>c__DisplayClass30_1.b__1() in C:\Users\abbas\OneDrive\Documents\WorkingProgress\MLStcokMarketPrediction\AutoMLApp\Class1.cs:line 135
at AutoMLApp.MlExperimentsFactory.StartNew(String ticker, ExperimentElementCollection expColl, PredictionTestDataElement testData) in C:\Users\abbas\OneDrive\Documents\WorkingProgress\MLStcokMarketPrediction\AutoMLApp\Class1.cs:line 109
at AutoMLApp.Program.Main(String[] args) in C:\Users\abbas\OneDrive\Documents\WorkingProgress\MLStcokMarketPrediction\AutoMLApp\Program.cs:line 31

This exception was originally thrown at this call stack:
[External Code]

Inner Exception 1:
OperationCanceledException: Operation was canceled.

Activity

added
AutoML.NETAutomating various steps of the machine learning process
P2Priority of the issue for triage purpose: Needs to be fixed at some point.
on Feb 11, 2021
michaelgsharp

michaelgsharp commented on Feb 11, 2021

@michaelgsharp
Contributor

@JakeRadMSFT any ideas on this? I haven't changed anything in ML.NET itself that would cause any issues.

aforoughi1

aforoughi1 commented on Feb 12, 2021

@aforoughi1
Author

TestFor5506Issue.zip
please find the attached sample to reproduce it.

aforoughi1

aforoughi1 commented on Feb 15, 2021

@aforoughi1
Author

closed by mistake. please reopen.

michaelgsharp

michaelgsharp commented on Feb 18, 2021

@michaelgsharp
Contributor

Hi @aforoughi1, would you be able to provide a sample project/data to help reproduce this?

Thanks!

aforoughi1

aforoughi1 commented on Feb 18, 2021

@aforoughi1
Author
michaelgsharp

michaelgsharp commented on Feb 18, 2021

@michaelgsharp
Contributor

Actually, I just saw you already uploaded your sample. Let me take a look at it. My bad for missing it, sorry about that.

michaelgsharp

michaelgsharp commented on Feb 18, 2021

@michaelgsharp
Contributor

Have you tested this code in prior versions of AutoML? Did it work before version 17.2?

aforoughi1

aforoughi1 commented on Feb 19, 2021

@aforoughi1
Author
aforoughi1

aforoughi1 commented on Feb 19, 2021

@aforoughi1
Author

#5445 fix, changes the behaviour. It seems to stop it to run to the end of the experiment. I have been using AutoML from preview phases and last working version was 0.17.2 and ml.net 1.5.2. It runs to the end with 0.17.2 and 1.5.4 too. However, it terminates with 0.17.4.

michaelgsharp

michaelgsharp commented on Feb 22, 2021

@michaelgsharp
Contributor

So after looking into this I think I have found the cause. Are you building from source? Or taking a nuget dependency on this? If you are building form source there is a workaround. If not, I'll see if I can get this change in for the next release.

If you look here you will see were are checking for an OperationCanceledException, and if that is the issue we just catch it and return the results. In this case, there is some parallel training happening, so instead of a single OperationCanceledException, there are multiple of them. This causes them to be an AggregateException, which then is not handled the way it should. The fix will be to add another catch for the AggregateException, and if all the inner exceptions are the OperationCanceledException then we will make it behave the same way it does for a single OperationCanceledException .

The next release is currently set for March 2nd, so I'll see if I can have this fix in by then. You are also free to make the changes and submit a PR if you would like.

aforoughi1

aforoughi1 commented on Feb 22, 2021

@aforoughi1
Author
michaelgsharp

michaelgsharp commented on Feb 23, 2021

@michaelgsharp
Contributor

So I have spent more time today looking into this. It seems like that is printed out, but you still get the final results back from AutoML right? Like even though you see this error printed to the console you are able to get a model back and use it, is that correct?

aforoughi1

aforoughi1 commented on Feb 23, 2021

@aforoughi1
Author

I get a null reference for the model.
sample code:
var experiment = mlContext.Auto().CreateRegressionExperiment(settings);
ExperimentResult experimentResult = null;
try
{
experimentResult = experiment.Execute(trainData: data, labelColumnName: "Target", progressHandler: new ProgressHandler());
}
catch (AggregateException exception)
{
foreach (Exception ex in exception.InnerExceptions)
{
Console.WriteLine(ex.ToString());
}
}
finally
{
ITransformer model = experimentResult.BestRun.Model;

            IDataView predictions = model.Transform(data);

            var metrics = mlContext.Regression.Evaluate(predictions, labelColumnName: "Target", scoreColumnName: "Score");
        }
ghost locked as resolved and limited conversation to collaborators on Mar 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    AutoML.NETAutomating various steps of the machine learning processP2Priority of the issue for triage purpose: Needs to be fixed at some point.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      Participants

      @aforoughi1@michaelgsharp

      Issue actions

        How to get around the exception caused because of #5506 fix? · Issue #5612 · dotnet/machinelearning