You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
publicclassIrisData{[Column("0")]publicfloatSepalLength;[Column("1")]publicfloatSepalWidth;[Column("2")]publicfloatPetalLength;[Column("3")]publicfloatPetalWidth;[Column("4")][ColumnName("Label")]publicstringLabel;}publicclassIrisPrediction{[ColumnName("PredictedLabel")][KeyType]publicuintPredictedLabels;}staticvoidMain(string[]args){using(varenv=newTlcEnvironment(seed:0)){stringdataPath="iris-data.txt";varloader=newTextLoader(env,newTextLoader.Arguments(){HasHeader=false,SeparatorChars=newchar[]{','},Column=new[]{ScalarCol("SepalLength",0),ScalarCol("SepalWidth",1),ScalarCol("PetalLength",2),ScalarCol("PetalWidth",3),ScalarCol("Label",4,DataKind.Text),}},newMultiFileSource(dataPath));IDataTransformtrans=newTermTransform(env,loader,"Label");trans=newConcatTransform(env,trans,"Features","SepalLength","SepalWidth","PetalLength","PetalWidth");vartrainer=newSdcaMultiClassTrainer(env,newSdcaMultiClassTrainer.Arguments());varcached=newCacheDataView(env,trans,prefetch:null);vartrainRoles=newRoleMappedData(cached,label:"Label",feature:"Features");varpred=trainer.Train(trainRoles);// Score.IDataViewscoredData=ScoreUtils.GetScorer(pred,trainRoles,env,trainRoles.Schema);// Do a simple prediction.varengine=env.CreatePredictionEngine<IrisData,IrisPrediction>(scoredData);varprediction=engine.Predict(newIrisData(){SepalLength=3.3f,SepalWidth=1.6f,PetalLength=0.2f,PetalWidth=5.1f,});Console.WriteLine($"Predicted flower type is: {prediction.PredictedLabels}");}}
What happened?
Unhandled Exception: System.ArgumentOutOfRangeException: Feature column 'Features' not found
Parameter name: name
at Microsoft.ML.Runtime.Data.ColumnInfo.CreateFromName(ISchema schema, String name, String descName)
at Microsoft.ML.Runtime.Data.RoleMappedSchema.MapFromNames(ISchema schema, IEnumerable`1 roles, Boolean opt)
at Microsoft.ML.Runtime.Data.RoleMappedSchema..ctor(ISchema schema, IEnumerable`1 roles, Boolean opt)
at Microsoft.ML.Runtime.Data.PredictedLabelScorerBase.BindingsImpl.ApplyToSchema(ISchema input, ISchemaBindableMapper bindable, IHostEnvironment env)
at Microsoft.ML.Runtime.Data.PredictedLabelScorerBase..ctor(IHostEnvironment env, PredictedLabelScorerBase transform, IDataView newSource, String registrationName)
at Microsoft.ML.Runtime.Data.MultiClassClassifierScorer..ctor(IHostEnvironment env, MultiClassClassifierScorer transform, IDataView newSource)
at Microsoft.ML.Runtime.Data.MultiClassClassifierScorer.ApplyToData(IHostEnvironment env, IDataView newSource)
at Microsoft.ML.Runtime.Data.ApplyTransformUtils.ApplyTransformToData(IHostEnvironment env, IDataTransform transform, IDataView newSource)
at Microsoft.ML.Runtime.Data.ApplyTransformUtils.ApplyAllTransformsToData(IHostEnvironment env, IDataView chain, IDataView newSource, IDataView oldSource)
at Microsoft.ML.Runtime.Api.BatchPredictionEngine`2..ctor(IHostEnvironment env, IDataView dataPipeline, Boolean ignoreMissingColumns, SchemaDefinition inputSchemaDefinition, SchemaDefinition outputSchemaDefinition)
at Microsoft.ML.Runtime.Api.PredictionEngine`2..ctor(IHostEnvironment env, IDataView dataPipe, Boolean ignoreMissingColumns, SchemaDefinition inputSchemaDefinition, SchemaDefinition outputSchemaDefinition)
at Microsoft.ML.Runtime.Api.ComponentCreation.CreatePredictionEngine[TSrc,TDst](IHostEnvironment env, IDataView dataPipe, Boolean ignoreMissingColumns, SchemaDefinition inputSchemaDefinition, SchemaDefinition outputSchemaDefinition)
at myApp.Program.Main(String[] args) in C:\Users\eerhardt\source\repos\MLNetCore30Test\Program.cs:line 182
What did you expect?
I expected it to work.
Notes
The reason (AFAICT) is because of the CacheDataView usage. When PredictionEngine is trying to apply all the transforms:
It hits that CacheDataView, which isn’t an IDataTransform, and it escapes out. Thus, the only transform that gets applied is the Scorer transform, and not any of the transforms used before (like adding the “Features” column).
We work around this in the tests by serializing the IDV out and then reading it back in:
privateIDataScorerTransformGetScorer(IHostEnvironmentenv,IDataViewtransforms,IPredictorpred,stringtestDataPath=null){using(varch=env.Start("Saving model"))using(varmemoryStream=newMemoryStream()){vartrainRoles=newRoleMappedData(transforms,label:"Label",feature:"Features");// Model cannot be saved with CacheDataViewTrainUtils.SaveModel(env,ch,memoryStream,pred,trainRoles);memoryStream.Position=0;using(varrep=RepositoryReader.Open(memoryStream,ch)){IDataLoadertestPipe=ModelFileUtils.LoadLoader(env,rep,newMultiFileSource(testDataPath),true);RoleMappedDatatestRoles=newRoleMappedData(testPipe,label:"Label",feature:"Features");returnScoreUtils.GetScorer(pred,testRoles,env,testRoles.Schema);}}}
I would not expect a user to have to do this. Any thoughts on how to make this better?
I removed the CacheDataView from my pipeline, which makes the code work but the training got super slow. So that seems to be a non-starter.
This will 'short-circuit' our CacheDataView to only be used for training and not used as part of the scoring pipeline.
Obviously, we should think some about how to make this arrangement less of a 'trap of failure' for new users. And I think it goes back to the idea of the 'smart' training (the training process that would cache if needed, normalize if needed and calibrate if needed).
System information
Issue
I'm trying to port https://www.microsoft.com/net/learn/machine-learning-and-ai/get-started-with-ml-dotnet-tutorial to the “direct access” API.
I expected it to work.
Notes
The reason (AFAICT) is because of the CacheDataView usage. When PredictionEngine is trying to apply all the transforms:
machinelearning/src/Microsoft.ML.Data/Utilities/ApplyTransformUtils.cs
Line 84 in c023727
It hits that CacheDataView, which isn’t an IDataTransform, and it escapes out. Thus, the only transform that gets applied is the Scorer transform, and not any of the transforms used before (like adding the “Features” column).
We work around this in the tests by serializing the IDV out and then reading it back in:
I would not expect a user to have to do this. Any thoughts on how to make this better?
I removed the CacheDataView from my pipeline, which makes the code work but the training got super slow. So that seems to be a non-starter.
/cc @TomFinley @Zruty0
The text was updated successfully, but these errors were encountered: