Description
Using the AutoML version 0.17.2 and 0.17.4, I get a few exceptions during SdcaRegression (simillar to #4363)
However, a new behaviour using 0.17.4, I get the AggregateException (changes because of #5445).
Exception during AutoML iteration: System.InvalidOperationException: The weights/bias contain invalid values (NaN or Infinite). Potential causes: high learning rates, no normalization, high initial weights, etc.
at Microsoft.ML.Trainers.OnlineLinearTrainer2.TrainModelCore(TrainContext context) at Microsoft.ML.Trainers.TrainerEstimatorBase
2.TrainTransformer(IDataView trainSet, IDataView validationSet, IPredictor initPredictor)
at Microsoft.ML.Data.EstimatorChain1.Fit(IDataView input) at Microsoft.ML.AutoML.RunnerUtil.TrainAndScorePipeline[TMetrics](MLContext context, SuggestedPipeline pipeline, IDataView trainData, IDataView validData, String groupId, String labelColumn, IMetricsAgent
1
metricsAgent, ITransformer preprocessorTransform, FileInfo modelFileInfo, DataViewSchema modelInputSchema, IChannel logger)
System.AggregateException
HResult=0x80131500
Message=One or more errors occurred. (Operation was canceled.) (Operation was canceled.) (Operation was canceled.) (Operation was canceled.)
Source=System.Private.CoreLib
StackTrace:
at System.ThrowHelper.ThrowAggregateException(List1 exceptions) at System.Threading.Tasks.Task.WaitAllCore(Task[] tasks, Int32 millisecondsTimeout, CancellationToken cancellationToken) at System.Threading.Tasks.Parallel.Invoke(ParallelOptions parallelOptions, Action[] actions) at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw(Exception source) at System.Threading.Tasks.Parallel.ThrowSingleCancellationExceptionOrOtherException(ICollection exceptions, CancellationToken cancelToken, Exception otherException) at System.Threading.Tasks.Parallel.Invoke(ParallelOptions parallelOptions, Action[] actions) at Microsoft.ML.Trainers.FastTree.ThreadTaskManager.ThreadTask.RunTask() at Microsoft.ML.Trainers.FastTree.LeastSquaresRegressionTreeLearner.FindBestSplitOfRoot(Double[] targets) at Microsoft.ML.Trainers.FastTree.LeastSquaresRegressionTreeLearner.FitTargets(IChannel ch, Boolean[] activeFeatures, Double[] targets) at Microsoft.ML.Trainers.FastTree.RandomForestLeastSquaresTreeLearner.FitTargets(IChannel ch, Boolean[] activeFeatures, Double[] weightedtargets, Double[] targets, Double[] weights) at Microsoft.ML.Trainers.FastTree.RandomForestOptimizer.TrainingIteration(IChannel ch, Boolean[] activeFeatures) at Microsoft.ML.Trainers.FastTree.FastTreeTrainerBase
3.Train(IChannel ch)
at Microsoft.ML.Trainers.FastTree.FastTreeTrainerBase3.TrainCore(IChannel ch) at Microsoft.ML.Trainers.FastTree.FastForestRegressionTrainer.TrainModelCore(TrainContext context) at Microsoft.ML.Trainers.TrainerEstimatorBase
2.TrainTransformer(IDataView trainSet, IDataView validationSet, IPredictor initPredictor)
at Microsoft.ML.AutoML.SmacSweeper.FitModel(IEnumerable1 previousRuns) at Microsoft.ML.AutoML.SmacSweeper.ProposeSweeps(Int32 maxSweeps, IEnumerable
1 previousRuns)
at Microsoft.ML.AutoML.PipelineSuggester.SampleHyperparameters(MLContext context, SuggestedTrainer trainer, IEnumerable1 history, Boolean isMaximizingMetric, IChannel logger) at Microsoft.ML.AutoML.PipelineSuggester.GetNextInferredPipeline(MLContext context, IEnumerable
1 history, DatasetColumnInfo[] columns, TaskKind task, Boolean isMaximizingMetric, CacheBeforeTrainer cacheBeforeTrainer, IChannel logger, IEnumerable1 trainerAllowList) at Microsoft.ML.AutoML.Experiment
2.Execute()
at Microsoft.ML.AutoML.ExperimentBase2.Execute(ColumnInformation columnInfo, DatasetColumnInfo[] columns, IEstimator
1 preFeaturizer, IProgress1 progressHandler, IRunner
1 runner)
at Microsoft.ML.AutoML.ExperimentBase2.ExecuteTrainValidate(IDataView trainData, ColumnInformation columnInfo, IDataView validationData, IEstimator
1 preFeaturizer, IProgress1 progressHandler) at Microsoft.ML.AutoML.ExperimentBase
2.Execute(IDataView trainData, IDataView validationData, ColumnInformation columnInformation, IEstimator1 preFeaturizer, IProgress
1 progressHandler)
at Microsoft.ML.AutoML.ExperimentBase2.Execute(IDataView trainData, IDataView validationData, String labelColumnName, IEstimator
1 preFeaturizer, IProgress1 progressHandler) at AutoMLApp.Experiment2Template.Train() in C:\Users\abbas\OneDrive\Documents\WorkingProgress\MLStcokMarketPrediction\AutoMLApp\Class1.cs:line 446 at AutoMLApp.MlModelTemplate.BuildModel() in C:\Users\abbas\OneDrive\Documents\WorkingProgress\MLStcokMarketPrediction\AutoMLApp\Class1.cs:line 363 at AutoMLApp.MlExperimentsFactory.Experiment2Tasks(Kind kind, OutputLabels op, BinaryClassificationMetric optimizingMetric, List
1 trainers) in C:\Users\abbas\OneDrive\Documents\WorkingProgress\MLStcokMarketPrediction\AutoMLApp\Class1.cs:line 177
at AutoMLApp.MlExperimentsFactory.<>c__DisplayClass30_1.b__1() in C:\Users\abbas\OneDrive\Documents\WorkingProgress\MLStcokMarketPrediction\AutoMLApp\Class1.cs:line 135
at AutoMLApp.MlExperimentsFactory.StartNew(String ticker, ExperimentElementCollection expColl, PredictionTestDataElement testData) in C:\Users\abbas\OneDrive\Documents\WorkingProgress\MLStcokMarketPrediction\AutoMLApp\Class1.cs:line 109
at AutoMLApp.Program.Main(String[] args) in C:\Users\abbas\OneDrive\Documents\WorkingProgress\MLStcokMarketPrediction\AutoMLApp\Program.cs:line 31
This exception was originally thrown at this call stack:
[External Code]
Inner Exception 1:
OperationCanceledException: Operation was canceled.
Activity
michaelgsharp commentedon Feb 11, 2021
@JakeRadMSFT any ideas on this? I haven't changed anything in ML.NET itself that would cause any issues.
aforoughi1 commentedon Feb 12, 2021
TestFor5506Issue.zip
please find the attached sample to reproduce it.
aforoughi1 commentedon Feb 15, 2021
closed by mistake. please reopen.
michaelgsharp commentedon Feb 18, 2021
Hi @aforoughi1, would you be able to provide a sample project/data to help reproduce this?
Thanks!
aforoughi1 commentedon Feb 18, 2021
michaelgsharp commentedon Feb 18, 2021
Actually, I just saw you already uploaded your sample. Let me take a look at it. My bad for missing it, sorry about that.
michaelgsharp commentedon Feb 18, 2021
Have you tested this code in prior versions of AutoML? Did it work before version 17.2?
aforoughi1 commentedon Feb 19, 2021
aforoughi1 commentedon Feb 19, 2021
#5445 fix, changes the behaviour. It seems to stop it to run to the end of the experiment. I have been using AutoML from preview phases and last working version was 0.17.2 and ml.net 1.5.2. It runs to the end with 0.17.2 and 1.5.4 too. However, it terminates with 0.17.4.
michaelgsharp commentedon Feb 22, 2021
So after looking into this I think I have found the cause. Are you building from source? Or taking a nuget dependency on this? If you are building form source there is a workaround. If not, I'll see if I can get this change in for the next release.
If you look here you will see were are checking for an
OperationCanceledException
, and if that is the issue we just catch it and return the results. In this case, there is some parallel training happening, so instead of a singleOperationCanceledException
, there are multiple of them. This causes them to be anAggregateException
, which then is not handled the way it should. The fix will be to add another catch for theAggregateException
, and if all the inner exceptions are theOperationCanceledException
then we will make it behave the same way it does for a singleOperationCanceledException
.The next release is currently set for March 2nd, so I'll see if I can have this fix in by then. You are also free to make the changes and submit a PR if you would like.
aforoughi1 commentedon Feb 22, 2021
michaelgsharp commentedon Feb 23, 2021
So I have spent more time today looking into this. It seems like that is printed out, but you still get the final results back from AutoML right? Like even though you see this error printed to the console you are able to get a model back and use it, is that correct?
aforoughi1 commentedon Feb 23, 2021
I get a null reference for the model.
sample code:
var experiment = mlContext.Auto().CreateRegressionExperiment(settings);
ExperimentResult experimentResult = null;
try
{
experimentResult = experiment.Execute(trainData: data, labelColumnName: "Target", progressHandler: new ProgressHandler());
}
catch (AggregateException exception)
{
foreach (Exception ex in exception.InnerExceptions)
{
Console.WriteLine(ex.ToString());
}
}
finally
{
ITransformer model = experimentResult.BestRun.Model;