Description
Since the new (early preview) 'ImageClassification - Transfer Learning' is aimed to provide a high-level API per scenario (image classifciation in this case), we should simplify the API even further.
Here's an example of a current training pipeline:
var pipeline = mlContext.Transforms.Conversion.MapValueToKey("Label")
.Append(mlContext.Transforms.LoadImages("ImageObject", null,
"ImagePath"))
.Append(mlContext.Transforms.ResizeImages("Image",
inputColumnName: "ImageObject", imageWidth: 299,
imageHeight: 299))
.Append(mlContext.Transforms.ExtractPixels("Image",
interleavePixelColors: true))
.Append(mlContext.Model.ImageClassification("Image",
"Label", arch: DnnEstimator.Architecture.InceptionV3,
epoch: 1, //an epoch is one learning cycle where the learner sees the whole training data set.
batchSize: 20)); // batchSize sets then number of images to feed the model at a time
Good points to simplify are image transformations and "technical" hyper-paramenters:
1. Hide image re-size step: Since the image size is determined by the internal DNN model (TensorFlow model, currently) why don't we simply do it within the ImageClassification
code?
2. Hide extract pixels step: Same thing here. The extract pixels step should be hidden as part of the ImageClassification
code. This is surfacing too much details. I expect the interleavePixelColors could also be determined by the info in the TF model .pb file, ot in the worst case based on the Architecture metadata that we could have per architecture (InceptionV3, etc.).
3. Hide EPOCH and BATCHSIZE: These are also configuration details that, even when they currently have "by default values" (epoch=10 and batchSize=20), those values are not valid for any imageset and architecture and should vary depending on the specific image-set size and chosen architecture. I'd be good to have them initialized dynamically within the ImageClassification
code depending on the context instead of "fixed default values" and parameters that surface to the user who might not know what is an EPOCH or BATCH.
After simplifying the above points, the example code could look like the following, which looks a lot simpler and clean: :)
var pipeline = mlContext.Transforms.Conversion.MapValueToKey("Label")
.Append(mlContext.Transforms.LoadImages("ImageObject", null, "ImagePath"))
.Append(mlContext.Model.ImageClassification("ImageObject", "Label", arch: DnnEstimator.Architecture.InceptionV3));
Even the architecture should also be simplified and maybe letting the user select per type of images to be trained, such as: Photos vs. Numbers/Digits vs. other characteristics that make one or the other architecture advisable to be used. A regular .NET developer usually won't know if he/she should use InceptionV3 vs. ResnetV2101, etc.
Activity
time888 commentedon Aug 11, 2019
I tested. DnnEstimator.Architecture.InceptionV3
DrawString "0" to Graphics Save to file 0.gif, and The other is the same 1.gif,2.gif,3.gif .
But classification incorrect。
How to configure parameters?
Give me a sample.
time888 commentedon Aug 11, 2019
The tensorflow_inception_graph.pb classification correct。
antoniovs1029 commentedon Jan 9, 2020
Seems this was closed in #4151