Skip to content

API scenarios implemented with low-level functions #653

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Aug 17, 2018

Conversation

Ivanidzo4ka
Copy link
Contributor

examples for #584

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
@Ivanidzo4ka Ivanidzo4ka requested a review from Zruty0 August 6, 2018 22:06
@Ivanidzo4ka Ivanidzo4ka changed the title add 10 examples of api scenarios add examples of API scenarios Aug 6, 2018

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
@Zruty0
Copy link
Contributor

Zruty0 commented Aug 7, 2018

using Microsoft.ML.Runtime.Api;

headers! #Closed


Refers to: test/Microsoft.ML.Tests/Scenarios/Api/ApiScenariosTests.cs:1 in e3db0e7. [](commit_id = e3db0e7, deletion_comment = False)

Complement = false
}, trainData);
// Auto-normalization.
var testRoles = new RoleMappedData(testFilter, label: "Label", feature: "Features");
Copy link
Contributor

@Zruty0 Zruty0 Aug 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testRoles [](start = 24, length = 9)

I think this is wrong here. We should not re-normalize test data, we should apply existing normalizer (line 56) to test data #Closed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still incorrect. Now we normalize on the entire dataset, which means that we also use test data for training a normalizer. This is incorrect.
You should 'normalize if needed' on the train set, as before, and then 'apply all transforms' to the test set.


In reply to: 208314510 [](ancestors = 208314510)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After this one is resolved, I'm fine


In reply to: 208616551 [](ancestors = 208616551,208314510)

var dataEval = new RoleMappedData(scorer, testRoles.Schema.GetColumnRoleNames(), opt: true);
var dict = eval.Evaluate(dataEval);
var foldMetrics = BinaryClassificationMetrics.FromMetrics(env, dict["OverallMetrics"], dict["ConfusionMatrix"]);
metrics.AddRange(foldMetrics);
Copy link
Contributor

@Zruty0 Zruty0 Aug 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AddRange [](start = 28, length = 8)

which ones will be added? Why AddRange? #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of them.
AddRange, because dict[""] is IDataView which can have more than one row (not the case for this one)


In reply to: 208314718 [](ancestors = 208314718)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I assume this particular 'foldMetrics' will contain only one item? Maybe then turn it into .Add(foldMetrics.Single()) for self-documentability?


In reply to: 208355368 [](ancestors = 208355368,208314718)

var scoreRoles = new RoleMappedData(concat, label: "Label", feature: "Features");
IDataScorerTransform scorer = ScoreUtils.GetScorer(predictor, scoreRoles, env, trainRoles.Schema);

// Cut of term transform from pipeline.
Copy link
Contributor

@Zruty0 Zruty0 Aug 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of [](start = 23, length = 2)

of -> out #Closed

IDataScorerTransform scorer = ScoreUtils.GetScorer(predictor, scoreRoles, env, trainRoles.Schema);

// Cut of term transform from pipeline.
var new_scorer = ApplyTransformUtils.ApplyAllTransformsToData(env, scorer, loader, term);
Copy link
Contributor

@Zruty0 Zruty0 Aug 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new_scorer [](start = 20, length = 10)

newScorer #Closed

// Cut of term transform from pipeline.
var new_scorer = ApplyTransformUtils.ApplyAllTransformsToData(env, scorer, loader, term);
var keyToValue = new KeyToValueTransform(env, new_scorer, "PredictedLabel");
var model = env.CreatePredictionEngine<IrisData, IrisPrediction>(keyToValue);
Copy link
Contributor

@Zruty0 Zruty0 Aug 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CreatePredictionEngine [](start = 32, length = 22)

can we make sure that it works in case there's no label? One way to do so is to create a SchemaDefinition out of typeof(IrisData) and remove the column associated with Label. This way Label column will not be created, and it should not cause a crash #Closed

var saver = new BinarySaver(env, new BinarySaver.Arguments());
using (var ch = env.Start("SaveData"))
using (var file = env.CreateOutputFile("i.idv"))
{
Copy link
Contributor

@Zruty0 Zruty0 Aug 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{ [](start = 16, length = 1)

delete #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete what?


In reply to: 208315619 [](ancestors = 208315619)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nothing.


In reply to: 208356915 [](ancestors = 208356915,208315619)

var trainRoles = new RoleMappedData(binData, label: "Label", feature: "Features");
var trainer = new LinearClassificationTrainer(env, new LinearClassificationTrainer.Arguments
{
NumThreads = 1
Copy link
Contributor

@Zruty0 Zruty0 Aug 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NumThreads [](start = 20, length = 10)

Are you sure this is fast enough? I seem to recall in other cases we raised ConvergenceThreshold as well #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 sec, is it fast enough?


In reply to: 208315814 [](ancestors = 208315814)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes


In reply to: 208356966 [](ancestors = 208356966,208315814)

NgramLength = 1,
Column = new[] { new WordBagTransform.Column() { Name = "Tokenize", Source = new[] { "SentimentText" } } }
},
loader
Copy link
Contributor

@Zruty0 Zruty0 Aug 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix spacing #Closed

{
var lteChild = tree.LteChild;
var gteChild = tree.GtChild;
//get Nodes;
Copy link
Contributor

@Zruty0 Zruty0 Aug 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get Nodes; [](start = 22, length = 10)

// Get nodes. #Closed

var threshold = GetValue<string>(node.KeyValues, "Threshold").Split(new[] { ' ' }, 2)[1];
var nodeIndex = i;
}
// get leafs
Copy link
Contributor

@Zruty0 Zruty0 Aug 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get leafs [](start = 23, length = 9)

// Get leaves. #Closed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not 'leafs', but 'leaves'


In reply to: 208316461 [](ancestors = 208316461)




}
Copy link
Contributor

@Zruty0 Zruty0 Aug 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove #Closed

dataEval = new RoleMappedData(newScorer, label: "Label", feature: "Features", opt: true);
var new_evaluator = new BinaryClassifierMamlEvaluator(env, new BinaryClassifierMamlEvaluator.Arguments() { Threshold = 0.01f, UseRawScoreThreshold = false });
metricsDict = new_evaluator.Evaluate(dataEval);
var new_metrics = BinaryClassificationMetrics.FromMetrics(env, metricsDict["OverallMetrics"], metricsDict["ConfusionMatrix"])[0];
Copy link
Contributor

@Zruty0 Zruty0 Aug 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new_metrics [](start = 20, length = 11)

should not use under_scores, use camelCase instead #Closed

Zruty0
Zruty0 previously requested changes Aug 7, 2018
Copy link
Contributor

@Zruty0 Zruty0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🕐

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
}


public class IrisPrediction
Copy link
Contributor

@Zruty0 Zruty0 Aug 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra line :) #Closed

foreach (var input in testData.Take(20))
{
var prediction = model.Predict(input);
Assert.True(prediction.PredictedLabel == "Iris-setosa");
Copy link
Contributor

@Zruty0 Zruty0 Aug 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iris-setosa [](start = 62, length = 11)

What, all of them? :) #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data file sorted by label, so first 50 is iris-setosa, then next 50 is iris-versicolor, and then last 50 is iris-virginica


In reply to: 208363615 [](ancestors = 208363615)

@Zruty0 Zruty0 changed the title add examples of API scenarios API scenarios implemented with low-level functions Aug 7, 2018
Ivan Matantsev added 2 commits August 7, 2018 16:09

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
some other PR comments

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
@Zruty0 Zruty0 dismissed their stale review August 8, 2018 15:02

revoking review

Copy link
Contributor

@Zruty0 Zruty0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🕐

var scoreRoles = new RoleMappedData(trans, label: "Label", feature: "Features");
IDataScorerTransform scorer = ScoreUtils.GetScorer(predictor, scoreRoles, env, trainRoles.Schema);

var dataEval = new RoleMappedData(scorer, label: "Label", feature: "Features", opt: true);
Copy link
Contributor

@Zruty0 Zruty0 Aug 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var dataEval = new RoleMappedData(scorer, label: "Label", feature: "Features", opt: true); [](start = 16, length = 90)

It looks a bit more professional to evaluate on test data, and not on train :) #Closed

Ivan Matantsev added 3 commits August 8, 2018 14:30

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
foreach (var input in testData.Take(20))
{
var prediction = model.Predict(input);
Assert.True(prediction.PredictedLabel == "Iris-setosa");
Copy link
Contributor

@Zruty0 Zruty0 Aug 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PredictedLabel [](start = 43, length = 14)

and here you can also verify that prediction.PredictedLabel == input.Label #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look on him, first he want test data without label, and then compare it label with predicted one!


In reply to: 208767421 [](ancestors = 208767421)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only wanted test data without label in ONE test, not in ALL tests. The ONE test that specifically checks that label is not required :)


In reply to: 208767752 [](ancestors = 208767752,208767421)

using (var file = env.CreateOutputFile("model.zip"))
TrainUtils.SaveModel(env, env.Start("saveChannel"), file, predictor, trainRoles);
new SaveOnnxCommand(env, new SaveOnnxCommand.Arguments { InputModelFile = "model.zip",
Json = "model.json", Onnx = "model.onnx" , Domain="dunno" }).Run();
Copy link
Contributor

@Zruty0 Zruty0 Aug 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Run [](start = 81, length = 3)

This is cheating, can't run commands here.
Let's take this example out of the PR with the goal to merge the others #Closed

@@ -15,6 +15,11 @@

namespace Microsoft.ML.Runtime.FastTree
{
[TlcModule.ComponentKind("FastTreeBinaryTrainer")]
Copy link
Contributor

@Zruty0 Zruty0 Aug 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ComponentKind [](start = 15, length = 13)

Are you sure you need this as a component?.. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should it be instead?


In reply to: 208784667 [](ancestors = 208784667)

Copy link
Contributor

@Zruty0 Zruty0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Contributor

@Zruty0 Zruty0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@Ivanidzo4ka Ivanidzo4ka requested a review from yaeldekel August 10, 2018 20:25
var lockEngine = new LockBasedPredictionEngine(env, model);
Parallel.ForEach(testData, (input) => lockEngine.Predict(input));
int numThreads = 2;
PredictionEngine<SentimentData, SentimentPrediction>[] models = new PredictionEngine<SentimentData, SentimentPrediction>[numThreads];
Copy link
Contributor Author

@Ivanidzo4ka Ivanidzo4ka Aug 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

useless #Resolved

}
}


Copy link
Contributor Author

@Ivanidzo4ka Ivanidzo4ka Aug 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spaces #Resolved



/// <summary>
/// This is a trivial implementation of a thread-safe prediction engine, where the underlying <see cref="SimplePredictionEngine"/>
Copy link
Contributor Author

@Ivanidzo4ka Ivanidzo4ka Aug 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<see cref="SimplePredictionEngine"/ [](start = 102, length = 35)

remove #Resolved

using (var ch = env.Start("saving"))
TrainUtils.SaveModel(env, ch, file, predictor, scoreRoles);
var threadEngine = new ThreadLocalBasedPredictionEngine(env, file);
var poolEngine = new PoolBasedPredictionEngine(env, file);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

poolEngine [](start = 24, length = 10)

let's get rid of this guy, and only keep the thread-local

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why, they provide examples of how it can be handled in a different ways.


In reply to: 209756860 [](ancestors = 209756860)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to not expose MadeObjectPool


In reply to: 209768067 [](ancestors = 209768067,209756860)

Copy link
Contributor

@Zruty0 Zruty0 Aug 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I don't want any additional classes, just use the existing PredictionEngine. Which means that you just do Parallel.ForEach(data, x=> lock(lockObject) { engine.Predict(x); } );


In reply to: 209770093 [](ancestors = 209770093,209768067,209756860)

/// *) The tree ensemble learners, I should be able to inspect the trees.
/// *) The LDA transform, I should be able to inspect the topics.
/// I view it as essential from a usability perspective that this be discoverable to someone without
/// having to read documentation.E.g.: if I have var lda = new LdaTransform().Fit(data)(I don't insist on that
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fit [](start = 87, length = 3)

Do we already have Fit() implemented?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is old Api examples, Fit is part of new Api

var previousLeafValue = GetValue<double>(node.KeyValues, "PreviousLeafValue");
var threshold = GetValue<string>(node.KeyValues, "Threshold").Split(new[] { ' ' }, 2)[1];
var nodeIndex = i;
}
Copy link
Member

@codemzs codemzs Aug 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are missing categorical split nodes. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took this code from Tree Visualizer, which I expect contains all required code. Also I'm not sure how to get them, I think I call all exposed methods in this example.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tree visualizer contains code for categorical splits. Let me know if you need help.


In reply to: 209790640 [](ancestors = 209790640)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find any word "categ" in that project, are you sure it's not part of GetNode or GetLeaf logic?

Copy link
Member

@codemzs codemzs Aug 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FastTree.cs, line 3192.

    public INode GetNode(int nodeId, bool isLeaf, IEnumerable<string> featuresNames = null)
        {
            var keyValues = new Dictionary<string, object>();
            if (isLeaf)
            {
                keyValues.Add(NodeKeys.LeafValue, _regTree.LeafValue(nodeId));
            }
            else
            {
                if (featuresNames != null)
                {
                    if (featuresNames is FeatureNameCollection features)
                    {
                        if (_regTree.CategoricalSplit[nodeId])
                        {
                            string featureList = string.Join(" OR \n",
                                _regTree.CategoricalSplitFeatures[nodeId].Select(feature => features[feature]));

                            keyValues.Add(NodeKeys.SplitName, featureList);
                        }
                        else
                            keyValues.Add(NodeKeys.SplitName, features[_regTree.SplitFeature(nodeId)]);
                    }
                }
                keyValues.Add(NodeKeys.Threshold, string.Format("<= {0}", _regTree.RawThreshold(nodeId)));
                if (_regTree.SplitGains != null)
                    keyValues.Add(NodeKeys.SplitGain, _regTree.SplitGains[nodeId]);
                if (_regTree.GainPValues != null)
                    keyValues.Add(NodeKeys.GainValue, _regTree.GainPValues[nodeId]);
                if (_regTree.PreviousLeafValues != null)
                    keyValues.Add(NodeKeys.PreviousLeafValue, _regTree.PreviousLeafValues[nodeId]);
            }


            return new TreeNode(keyValues);
        }

In reply to: 209791675 [](ancestors = 209791675)

Copy link
Contributor Author

@Ivanidzo4ka Ivanidzo4ka Aug 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, and I'm getting SplitName from node (node, not leaf) in my code, which is represent categorical split features or just this exact feature.
Are you want me to do something with this information? #Resolved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a comprehensive guide to FastTree internal structure. I think the code looked OK as it is: it just demonstrates that it is possible to inspect the model after training.


In reply to: 209795941 [](ancestors = 209795941)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough.


In reply to: 209990650 [](ancestors = 209990650,209795941)

Copy link
Contributor

@Zruty0 Zruty0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Contributor

@Zruty0 Zruty0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Contributor

@Zruty0 Zruty0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

using (var cursor = trans.GetRowCursor(x => true))
{
Assert.True(cursor.Schema.TryGetColumnIndex("SentimentText", out int textColumn));
Assert.True(cursor.Schema.TryGetColumnIndex("Features_TransformedText", out int transformedTextColumn));
Copy link
Member

@codemzs codemzs Aug 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Features_TransformedText [](start = 65, length = 24)

I'm just trying to speak from a user who is new to ML.NET and I think they will have questions about this column name, how is one supposed to know such intermediate columns generated by the text transform? what are the other intermediate columns generated? what if I already had a column with this name before text transform was applied, will this hide my column or a suffix will be added. What if I had one column and I transformed it 'n' times and kept the dst column name the same then how will you look through all the intermediate transformation?

Please document all this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See code with comment above

Copy link
Member

@codemzs codemzs Aug 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, you are showing how to get column names, slight improvement in the example but lets focus on what you are advertising in the summary, you are saying "possibly through the debugger, be not such a pain to actually see what is happening to your data when you apply this or that transform" is this really possible with your example for cases where I have a column with name "FOO" and I apply two or more transforms to it that don't produce intermediate columns(and hide the src column) then how will I see what the first transform did? Either you show how this is done or change the documentation to reflect limitations with this example.


In reply to: 210415330 [](ancestors = 210415330)

{
var columnName = trans.Schema.GetColumnName(i);
var columnType = trans.Schema.GetColumnType(i).RawType;
}
Copy link
Member

@codemzs codemzs Aug 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Little weird to just get column names and not to do anything with them, right? may be reuse them somehow at line 39-41? Make it look as if you are looking for a column name, then store it and document why you expect this column to be there in the first place, like Features_TransformedText is produced as intermediate column by the text transform. May be talk about suffix being added(or column hidding) if such a column name existed or even better add to that example.

Such examples and documentation will be very useful for people who are getting started with ML.NET.

Copy link
Member

@codemzs codemzs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@Zruty0 Zruty0 merged commit 759dafb into dotnet:master Aug 17, 2018
@ghost ghost locked as resolved and limited conversation to collaborators Mar 29, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants