Skip to content

Re-using the same Dataview with Bitmaps in memory, breaks when fitting different models or run cross validation on it #4126

Closed
@ssaporito

Description

@ssaporito

System information

  • OS version/distro: Windows 10
  • .NET Version (eg., dotnet --info): .NET Core 2.2

Issue

  • What did you do?
    I had a working pipeline for training image classification with cross-validation on the previous ML.NET version, using file paths as input. Now, being able to load Bitmaps, I am trying to setup a similar pipeline, but allowing training and predictions from in-memory bitmaps.
  • What happened?
    The training works if I just Fit the data,
    ITransformer mlModel = pipeline.Fit(trainData);
    but it fails if I try to use CrossValidate
    var cvResults = _mlContext.MulticlassClassification.CrossValidate(trainData, pipeline, numberOfFolds);
  • What did you expect?
    I expected a pipeline that worked with Fit to work with CrossValidate, but it seems the internal multiple passes do something to the Bitmaps (they lose data).

Source code / logs

My current pipeline, based on this sample is this:

var pipeline = _mlContext.Transforms.Conversion.MapValueToKey("Label")               
                .Append(_mlContext.Transforms.ResizeImages(outputColumnName: TensorFlowModelSettings.inputTensorName, imageWidth: ImageSettings.imageWidth, imageHeight: ImageSettings.imageHeight, inputColumnName: nameof(ImageInputData.Image)))                
                .Append(_mlContext.Transforms.ExtractPixels(outputColumnName: TensorFlowModelSettings.inputTensorName, interleavePixelColors: ImageSettings.channelsLast, offsetImage: ImageSettings.mean/*, inputColumnName: nameof(ImageInputData.Image)*/))                
                .Append(_mlContext.Model.LoadTensorFlowModel(tensorFlowModelFilePath).
                ScoreTensorFlowModel(outputColumnNames: new[] { TensorFlowModelSettings.outputTensorName },
                                    inputColumnNames: new[] { TensorFlowModelSettings.inputTensorName }, addBatchDimensionInput: false))                
                .Append(_mlContext.MulticlassClassification.Trainers.LbfgsMaximumEntropy(labelColumnName: "Label", featureColumnName: TensorFlowModelSettings.outputTensorName))
                .Append(_mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabel"))
                .AppendCacheCheckpoint(_mlContext);

The error log includes the following exceptions:

System.ArgumentException: Parameter is not valid.
   at System.Drawing.Image.get_Height()
   at Microsoft.ML.Transforms.Image.ImageResizingTransformer.Mapper.<>c__DisplayClass3_0.<MakeGetter>b__1(Bitmap& dst)
   at Microsoft.ML.Transforms.Image.ImagePixelExtractingTransformer.Mapper.<>c__DisplayClass5_0`1.<GetGetterCore>b__1(VBuffer`1& dst)
   at Microsoft.ML.Data.DataViewUtils.Splitter.InPipe.Impl`1.Fill()
   at Microsoft.ML.Data.DataViewUtils.Splitter.<>c__DisplayClass9_0.<SplitCore>b__1()
System.InvalidOperationException: Splitter/consolidator worker encountered exception while consuming source data ---> System.ArgumentException: Parameter is not valid.
   at System.Drawing.Image.get_Height()
   at Microsoft.ML.Transforms.Image.ImageResizingTransformer.Mapper.<>c__DisplayClass3_0.<MakeGetter>b__1(Bitmap& dst)
   at Microsoft.ML.Transforms.Image.ImagePixelExtractingTransformer.Mapper.<>c__DisplayClass5_0`1.<GetGetterCore>b__1(VBuffer`1& dst)
   at Microsoft.ML.Data.DataViewUtils.Splitter.InPipe.Impl`1.Fill()
   at Microsoft.ML.Data.DataViewUtils.Splitter.<>c__DisplayClass9_0.<SplitCore>b__1()
   --- End of inner exception stack trace ---
   at Microsoft.ML.Data.DataViewUtils.Splitter.Batch.SetAll(OutPipe[] pipes)
   at Microsoft.ML.Data.DataViewUtils.Splitter.Cursor.MoveNextCore()
   at Microsoft.ML.Data.RootCursorBase.MoveNext()
   at Microsoft.ML.Data.DataViewUtils.Splitter.<>c__DisplayClass5_1.<ConsolidateCore>b__2()

This is my first issue here, and I apologize if I overlooked something. I found no posts about this error anywhere.

Metadata

Metadata

Assignees

Labels

P0Priority of the issue for triage purpose: IMPORTANT, needs to be fixed right away.bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions