Skip to content

Warning messages when using types not supporting missing values as labels #1059

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
artidoro opened this issue Sep 26, 2018 · 1 comment
Closed
Assignees

Comments

@artidoro
Copy link
Contributor

artidoro commented Sep 26, 2018

System information

  • OS version/distro: Windows 10
  • .NET Version:
.NET Core SDK (reflecting any global.json):
 Version:   2.1.402
 Commit:    3599f217f4

Runtime Environment:
 OS Name:     Windows
 OS Version:  10.0.17134
 OS Platform: Windows
 RID:         win10-x64
 Base Path:   C:\Program Files\dotnet\sdk\2.1.402\

Host (useful for support):
  Version: 2.1.4
  Commit:  85255dde3e

Issue

  • What did you do?
    I updated this tutorial to the new API (see code below), and ran it.

  • What happened?
    I got the following output and warning:

Auto-tuning parameters: UseCat = False
Auto-tuning parameters: LearningRate = 0.2
Auto-tuning parameters: NumLeaves = 20
Auto-tuning parameters: MinDataPerLeaf = 5
Auto-tuning parameters: UseSoftmax = False
LightGBM objective=multiclassova
Warning: There is no NA value for type 'I4'. The missing key value will be mapped to the default value of 'I4'
Warning: There is no NA value for type 'I4'. The missing key value will be mapped to the default value of 'I4'
Warning: There is no NA value for type 'I4'. The missing key value will be mapped to the default value of 'I4'
Predicted flower type is: 2
  • What did you expect?

I did not expect any warning for using integers as labels. It is a warning from KeyToValueTransform. The same warning appears when using string labels instead of integer labels.

I think the reason for the warning is that KeyToValue might be lossy, since we do not support missing values for integers, and we warn the user every time integer labels are used. We currently map missing values to the default value of int which is 0. If we are using 0 as a label, which is a very reasonable thing to do with int labels, we would be mapping missing labels to an existing label.

We don't want this warning to be displayed every time, since integer labels are reasonable to have. A possible solution might be not to warn the user every time that integer labels are used, but instead only warn when missing integer labels are mapped to an existing label.


Code that I ran:

using Microsoft.ML.Runtime.Api;
using Microsoft.ML.Runtime.Data;
using Microsoft.ML.Runtime.LightGBM;
using System;

namespace myApp
{
    class Program
    {
        // STEP 1: Define your data structures

        // IrisData is used to provide training data, and as 
        // input for prediction operations
        // - First 4 properties are inputs/features used to predict the label
        // - Label is what you are predicting, and is only set when training
        public class IrisData
        {
            [Column("0")]
            public float SepalLength;

            [Column("1")]
            public float SepalWidth;

            [Column("2")]
            public float PetalLength;

            [Column("3")]
            public float PetalWidth;

            [Column("4")]
            [ColumnName("Label")]
            public int Label;
        }

        // IrisPrediction is the result returned from prediction operations
        public class IrisPrediction
        {
            [ColumnName("PredictedLabel")]
            public int PredictedLabels;
        }

        static TextLoader.Arguments GetIrisLoaderArgs()
        {
            return new TextLoader.Arguments()
            {
                Separator = "comma",
                HasHeader = true,
                Column = new[]
                {
                    new TextLoader.Column("SepalLength", DataKind.R4, 0),
                    new TextLoader.Column("SepalWidth", DataKind.R4, 1),
                    new TextLoader.Column("PetalLength", DataKind.R4, 2),
                    new TextLoader.Column("PetalWidth", DataKind.R4, 3),
                    new TextLoader.Column("Label", DataKind.I4, 4)
                }
            };
        }

        static void Main(string[] args)
        {
            // STEP 2: Create a pipeline and load your data
            //var pipeline = new LearningPipeline();
            var env = new ConsoleEnvironment();

            // If working in Visual Studio, make sure the 'Copy to Output Directory' 
            // property of iris-data.txt is set to 'Copy always'
            string dataPath = "iris-data.txt";
            var data = new TextLoader(env, GetIrisLoaderArgs()).Read(new MultiFileSource(dataPath));

            // STEP 3: Transform your data
            // Assign numeric values to text in the "Label" column, because only
            // numbers can be processed during model training
            var pipeline = new TermEstimator(env, "Label")
                // Puts all features into a vector
                .Append(new ConcatEstimator(env, "Features", new string[] { "SepalLength", "SepalWidth", "PetalLength", "PetalWidth" }))
                // STEP 4: Add learner
                // Add a learning algorithm to the pipeline. 
                // This is a classification scenario (What type of iris is this?)
                .Append(new LightGbmMulticlassTrainer(env, "Label", "Features"))
                // Convert the Label back into original text (after converting to number in step 3)
                .Append(new KeyToValueEstimator(env, "PredictedLabel"));

            // STEP 5: Train your model based on the data set
            var model = pipeline.Fit(data);
            var engine = model.MakePredictionFunction<IrisData, IrisPrediction>(env);

            // STEP 6: Use your model to make a prediction
            // You can change these numbers to test different predictions
            var prediction = engine.Predict(new IrisData()
            {
                SepalLength = 3.3f,
                SepalWidth = 1.6f,
                PetalLength = 0.2f,
                PetalWidth = 5.1f,
            });

            Console.WriteLine($"Predicted flower type is: {prediction.PredictedLabels}");
            Console.ReadLine();
        }
    }
}
@artidoro artidoro changed the title Warninig messages when using integer labels Warninig messages when using types not supporting missing values as labels Sep 26, 2018
@shauheen shauheen assigned artidoro and codemzs and unassigned artidoro Sep 26, 2018
@TomFinley
Copy link
Contributor

TomFinley commented Sep 27, 2018

Hi @artidoro thanks for writing this up. Nowadays very few of the values we are likely to do ToKey from (and, in reverse, ToValue) now support missing values, since #863. We had I suppose forgotten about this little warning, since previously no type of interest did not have a missing value, but now almost none of them do.

I propose the following remediation, in order of my confidence in its correctness.

  1. At the very least lets get rid of the warning -- the warning is clearly pointless and counterproductive in the new world, since it applies to most keys of interest. (Keys over floating point types I consider less interesting than those of string and integer.) I might even say that nuking those few lines really quickly is something we should consider to cherry pick into 0.6. @shauheen can comment on the wisdom or not of that.

  2. Let's consider being able to specify a replacement value for any missing keys. This is unimportant in the case of string key-values (the ToKey does not map empty strings anyway, so there is no potential for ambiguity or confusion), but is more serious in the case of int key-values (where 0 is a quite reasonable input value).

  3. If we encounter a missing key when mapping to a type that does not have a missing, and that default value does appear in the output key-values, we might consider throwing. We have previously been against throwing in cursors, but I believe the idea that we can throw in cursors has "won out."

@justinormont justinormont changed the title Warninig messages when using types not supporting missing values as labels Warning messages when using types not supporting missing values as labels Sep 28, 2018
@shauheen shauheen closed this as completed Oct 5, 2018
@ghost ghost locked as resolved and limited conversation to collaborators Mar 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants