Skip to content

It is not possible to use PermutationFeatureImportance from a model loaded from disk in F# #3976

Closed
@fwaris

Description

@fwaris

I am trying to use PermutationFeatureImportance (PFI) with F# but the F# type system is not resolving ITransformer to ISingleFeaturePredictionTransformer - which is required by PFI.

I believe it is due to IPredictorProducing (and related interfaces) being marked as "internal".

F# supports explicit interfaces and maybe that is the reason for this issue.

Here is a snippet of code that shows what I am trying to do
(I am using the latest bits - v 1.2.0 at the time of this post)

let mutable schema = null
let mdl = ctx.Model.Load(@"F:\fwaris\data\t\analysis\model_cv_LightGbmBinary.bin", &schema) 
let mdlt =  mdl :?> TransformerChain<ITransformer>
let m1 =  mdlt.LastTransformer //debugger shows it is Microsoft.ML.Data.BinaryPredictionTransformer<Microsoft.ML.IPredictorProducing<float>>
let scored = mdl.Transform(trainView)
scored.Preview()
ctx.BinaryClassification.PermutationFeatureImportance(m1 :?> _,scored)

@dsyme

Activity

fwaris

fwaris commented on Jul 12, 2019

@fwaris
Author

work around for now is to use a C# helper given below but really if an interface (IPredictorProducing) is going to be exposed via another public interface, it should not really be marked internal.

    public static class MLHelper<T> where T : class
    {
        public static System.Collections.Immutable.ImmutableArray<BinaryClassificationMetricsStatistics> PFI_BinaryClassification
             
             (
                MLContext ctx,
                ITransformer model,
                IDataView data,
                string labelColumnName = "Label",
                bool useFeatureWeightFilter = false,
                int? numberOfExamplesToUse = null,
                int permutationCount = 1
            )
        {
            
            var m = ctx.BinaryClassification.PermutationFeatureImportance(
                    model as ISingleFeaturePredictionTransformer<T>, 
                    data, 
                    labelColumnName : labelColumnName, 
                    useFeatureWeightFilter : useFeatureWeightFilter, 
                    numberOfExamplesToUse : numberOfExamplesToUse, 
                    permutationCount : permutationCount
                    );
            return m;
        }
        
    }
eerhardt

eerhardt commented on Aug 2, 2019

@eerhardt
Member

@fwaris - I just ran into this issue as well. I don't understand how your workaround works. What T is getting passed into MLHelper<T>?

@codemzs - this is the same issue as we were discussing today. I don't think it is possible to use PermutationFeatureImportance once a model is saved to disk.

This is an issue because if you use AutoML, it always saves the model to disk in order to save on memory.

The problem is this code:

internal static class BinaryPredictionTransformer
{
public const string LoaderSignature = "BinaryPredXfer";
public static BinaryPredictionTransformer<IPredictorProducing<float>> Create(IHostEnvironment env, ModelLoadContext ctx)
=> new BinaryPredictionTransformer<IPredictorProducing<float>>(env, ctx);
}

Whenever you load a predition transformer from a model stream, it is always creating an instance of a new BinaryPredictionTransformer<IPredictorProducing<float>>. This object cannot be cast to an ISingleFeaturePredictionTransformer<TModel> that is necessary for calling PermutationFeatureImportance because the T in this case (IPredictorProducing<float>) is internal.

We need to change the above code to save off the right type into the model, and create an instance of BinaryPredictionTransformer<TModel>, where TModel is the type that was originally used when training the pipeline before saving to disk - for example, BinaryPredictionTransformer<CalibratedModelParametersBase<LightGbmBinaryModelParameters, PlattCalibrator>> when using LightGbm.

/cc @Dmitry-A @justinormont

changed the title [-]IPredictorProducing 'internal' is causing issues with F# type resolution[/-] [+]It is not possible to use PermutationFeatureImportance from a model loaded from disk[/+] on Aug 2, 2019
fwaris

fwaris commented on Aug 3, 2019

@fwaris
Author

@eerhardt, it seems you can punt on the type resolution in F# by using an underscore; i.e. the following trick seems to work (I tested again just to make sure):

let metrics = MLHelper<_>.PFI_BinaryClassification(mlctx, model, labelColumnName="Label")

The 'model' variable is of the concrete type (from debugger):
Microsoft.ML.Data.BinaryPredictionTransformer<Microsoft.ML.IPredictorProducing>

However I agree with you that this area requires re-work to make it easier to use.

added a commit that references this issue on Oct 2, 2019

Addresses #3976 about using PFI with a model loaded from disk (#4262)

4088620
antoniovs1029

antoniovs1029 commented on Dec 20, 2019

@antoniovs1029
Member

Hi. So PRs #4262 and #4306 fixed the problem Eric pointed out in his comment in this thread.

So please, let us know if this has been fixed for you. Particularly, those PRs where only tested for ML.NET on C#, so I would appreciate feedback from the F# side. I will rename and tag this issue as F# specific then, since that was your original problem.

changed the title [-]It is not possible to use PermutationFeatureImportance from a model loaded from disk[/-] [+]It is not possible to use PermutationFeatureImportance from a model loaded from disk in F#[/+] on Dec 20, 2019
artemiusgreat

artemiusgreat commented on Dec 26, 2019

@artemiusgreat

This is not fixed yet.
There are 2 ways to save the model.

1. As a pipeline + estimator
https://docs.microsoft.com/en-us/dotnet/machine-learning/how-to-guides/save-load-machine-learning-models-ml-net

var pipeline = context.Transforms.Conversion.MapValueToKey("Label", "X").Concatenate("Features", "X1", "X2");
var estimator = context.MulticlassClassification.Trainers.LightGbm();
var model = pipeline.Append(estimator).Fit(dataView);
context.Model.Save(model, dataView.Schema, "C:/model.zip");

2. As an estimator, without pipeline
https://docs.microsoft.com/en-us/dotnet/machine-learning/how-to-guides/explain-machine-learning-model-permutation-feature-importance-ml-net#train-the-model

var estimator = context.MulticlassClassification.Trainers.LightGbm();
var transformedData = pipeline.Fit(dataView).Transform(dataView);
var model = estimator.Fit(transformedData);
context.Model.Save(model, dataView.Schema, "C:/model.zip");

Then loading from the disk.

var model = context.Model.Load("C:/model.zip", out var schema);
var engine = context.Model.CreatePredictionEngine<InputModel, OutputModel>(model);

1. As a pipeline + estimator - model contains only pipeline transformers, including MapValueToKey and Concatenate, there is no way to get actual trainer / estimator and use it for PFI. LastTransformer property will return Concatenate transformer, but PFI requires an estimator, e.g. LighGbm or Regression

2. As an estimator without pipeline - now I see LightGbm trainer in the list of TransformationChain, but CreatePredictionEngine raises an exception "Features" column is not defined, because in this case model was saved as a pure estimator, without pipeline

4 remaining items

artemiusgreat

artemiusgreat commented on Dec 27, 2019

@artemiusgreat

The only thing I needed is to run PFI using model loaded from file. As far as it works, I'm happy

antoniovs1029

antoniovs1029 commented on Dec 31, 2019

@antoniovs1029
Member

Hi, @artemiusgreat . So I am not sure: is your problem solved or not?

I believe it should be possible to access the lastTransformer directly from the model you saved to disk on the "1. As a pipeline + estimator" point by simply using:

var predictor = (lodedModel as TransformerChain<ITransformer>).LastTransformer as MulticlassPredictionTransformer<OneVersusAllModelParameters>;

I am not sure why would you need to use the .SelectMany(...) method you mentioned.

but PFI requires an estimator, e.g. LightGbm or Regression

PFI doesn't require an estimator, but a Prediction Transformer. So, in your example, the LightGbm trainer is also an estimator, and once it is trained (with .Fit()) it returns a Prediction Transformer of type MulticlassPredictionTransformer<OneVersusAllModelParameters>. You should pass this last transformer to PFI, and not the trainer or estimator:

pfi = ML.MulticlassClassification.PermutationFeatureImportance(predictor, data);

If you are still facing problems, please share with us the complete code and dataset you're using, so that I can take a closer look. Thanks.

self-assigned this
on Jan 2, 2020
added
P1Priority of the issue for triage purpose: Needs to be fixed soon.
on Jan 9, 2020
artemiusgreat

artemiusgreat commented on Mar 13, 2020

@artemiusgreat

@antoniovs1029 Sorry, missed your comment. Yest it was fixed. Thanks.

added
loadsaveBugs related loading and saving data or models
regressionBugs related regression tasks
lightgbmBugs related lightgbm
on Apr 29, 2020
antoniovs1029

antoniovs1029 commented on Jun 4, 2020

@antoniovs1029
Member

So I've just tested the original scenario of this issue, on F#, and now it works... so it was indeed fixed by PRs #4262 and #4306 .

fwaris

fwaris commented on Jun 21, 2020

@fwaris
Author

Also, confirming that it works.

See this issue comment for some tricks that help when working with AutoML outputs
dotnet/docs#19006 (comment)

Note: The fix works in a compiled F# project but not in F# interactive (fsi) because the current fsi is bound to older libraries. I expect that it will work in the new preview version of fsi but I have not tested that yet.

ghost locked as resolved and limited conversation to collaborators on Mar 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

F#Support of F# languageP1Priority of the issue for triage purpose: Needs to be fixed soon.lightgbmBugs related lightgbmloadsaveBugs related loading and saving data or modelsneed infoThis issue needs more info before triageregressionBugs related regression tasks

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

    Development

    No branches or pull requests

      Participants

      @fwaris@codemzs@harishsk@artemiusgreat@eerhardt

      Issue actions

        It is not possible to use PermutationFeatureImportance from a model loaded from disk in F# · Issue #3976 · dotnet/machinelearning