Skip to content

IRISClassification sample -MultiLabel calssification : Getting exception while referring slotnames #2810

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
prathyusha12345 opened this issue Mar 1, 2019 · 12 comments · Fixed by #2804
Assignees
Labels
bug Something isn't working
Milestone

Comments

@prathyusha12345
Copy link

prathyusha12345 commented Mar 1, 2019

@Ivanidzo4ka
I am trying to do multilabel classification on IRISClassification. I am referring to this link https://github.com/dotnet/machinelearning/blob/master/test/Microsoft.ML.Tests/Scenarios/Api/Estimators/PredictAndMetadata.cs#L41

While I am running the code I am getting below exception 'Invalid call to 'GetGetter'' while accessing slotnames.

image

@Ivanidzo4ka Ivanidzo4ka added the bug Something isn't working label Mar 1, 2019
@Ivanidzo4ka
Copy link
Contributor

Thank you for reporting this.
I'm working on this issue right now.
Problem is what we have internal convention to treat all SlotNames as text (they called Names for a reason) but deep inside if origin of key type is something other than string we don't do proper casting.
@TomFinley Am I right what at this point only way to get original label values is to access KeyValue annotations on PredictedLabel?

@Ivanidzo4ka Ivanidzo4ka self-assigned this Mar 1, 2019
@Ivanidzo4ka
Copy link
Contributor

To unblock your self you can change definition of slotNames from
VBuffer<ReadOnlyMemory<char>> to
VBuffer<float> and I would assume it would give you original keys. But I will change that functionality in next release.

@prathyusha12345
Copy link
Author

@Ivanidzo4ka after using
VBuffer Getting compile time error as below.

image

@Ivanidzo4ka
Copy link
Contributor

Oh, right, this is one of our assumptions what slotnames should be strings.
this one should do the trick:
predEngine.OutputSchema[""].Annotations.GetValue(AnnotationUtils.Kinds.SlotNames, ref slotNames);

@prathyusha12345
Copy link
Author

@Ivanidzo4ka it showing AnnotationUtils in accessible due to its protection level

@Ivanidzo4ka
Copy link
Contributor

We did a good job on hiding our internals.

VBuffer<float> keys = default;
engine.OutputSchema[nameof(IrisPrediction.PredictedLabel)].GetKeyValues(ref keys);

this one?

@prathyusha12345
Copy link
Author

As discussed getting new exception.
image

IRISPrediction class as below.

public class IrisPrediction
{
[ColumnName("label")]
public float Label;

    public float[] Score;
}

And I am doing MapToKeyValue transformation in training pipeLine as below

var trainer = mlContext.MulticlassClassification.Trainers.StochasticDualCoordinateAscent(labelColumnName: DefaultColumnNames.Label, featureColumnName: DefaultColumnNames.Features);
var trainingPipeline = dataProcessPipeline.Append(trainer)
.Append(mlContext.Transforms.Conversion.MapKeyToValue("label", "PredictedLabel"));

@Ivanidzo4ka
Copy link
Contributor

Ivanidzo4ka commented Mar 1, 2019

 public class Iris
        {
            [LoadColumn(0)]
            public float Label;
            [LoadColumn(1)]
            public float SepalLength;

            [LoadColumn(2)]
            public float SepalWidth;

            [LoadColumn(3)]
            public float PetalLength;

            [LoadColumn(4)]
            public float PetalWidth;
        }
        public class IrisPredictions
        {
            [ColumnName("label")]
            public float Label;

            public float[] Score;
        }
void Prediction()
{
           var dataPath = GetDataPath(TestDatasets.irisLoader.trainFilename);
            var ml = new MLContext();

            var data = ml.Data.LoadFromTextFile<Iris>(dataPath);

            var pipeline = ml.Transforms.Concatenate("Features", "SepalLength", "SepalWidth", "PetalLength", "PetalWidth")
                .Append(ml.Transforms.Conversion.MapValueToKey(nameof(Iris.Label)))
                .Append(ml.MulticlassClassification.Trainers.StochasticDualCoordinateAscent(
                    new SdcaMultiClassTrainer.Options { MaxIterations = 100, Shuffle = true, NumThreads = 1, }).
                    Append(ml.Transforms.Conversion.MapKeyToValue("label", "PredictedLabel")));

            var model = pipeline.Fit(data);
            var engine = model.CreatePredictionEngine<Iris, IrisPredictions>(ml);

            var testLoader = ml.Data.LoadFromTextFile<Iris>(dataPath);
            var testData = ml.Data.CreateEnumerable<Iris>(testLoader, false);

            // During prediction we will get Score column with 3 float values.

            // Let's look how we can convert key value for PredictedLabel to original labels.
            // We need to read KeyValues for "PredictedLabel" column.
            VBuffer<float> keys = default;
            engine.OutputSchema["PredictedLabel"].GetKeyValues(ref keys);

            var scoreValues = keys.DenseValues().ToArray();

            foreach (var input in testData.Take(20))
            {
                var prediction = engine.Predict(input);
                for (int i = 0; i < scoreValues.Length; i++)
                    Console.WriteLine($"{scoreValues[0]}: {prediction.Score[i]}");
}

@prathyusha12345
Copy link
Author

@Ivanidzo4ka This code is working fine. But how do we map each score to label? For example GitHub labeler sample has 'area' as label. so we get scoreValues array more than 3. How do we map all of them? Do we need to write program manually like I have written in GITHUBLabeler sample here or do we have any predefined code written in ML.Net.

Because obviously the purpose of this classification is to find the scores and map them to labels accordingly.

@Ivanidzo4ka
Copy link
Contributor

Console.WriteLine($"Predicted label: {scoreValues[i]}: {prediction.Score[i]}");
I probably should call scoreValues as originalLabels, or something like this.
you have two arrays, one with score values, one with original labels, they have same amount of elements and can be mapped to each over by index

@Ivanidzo4ka
Copy link
Contributor

Yes, if you can access slotnames. They broken right now if you have non string label.

@prathyusha12345
Copy link
Author

@Ivanidzo4ka I got your point that once we get scores and labels we need to Zip them to map each label to score. But From machine learning beginner/user perspective, its difficult to understand terms like slotnames, keys and zip them label and score manually. Its confusing for learners/users.

The better way is get a dictionary of label and scores and user sorts them if needed.

@shauheen shauheen added this to the 0319 milestone Mar 12, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Mar 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants