-
Notifications
You must be signed in to change notification settings - Fork 1.9k
API scenarios implemented with low-level functions #653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Complement = false | ||
}, trainData); | ||
// Auto-normalization. | ||
var testRoles = new RoleMappedData(testFilter, label: "Label", feature: "Features"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
testRoles [](start = 24, length = 9)
I think this is wrong here. We should not re-normalize test data, we should apply existing normalizer (line 56) to test data #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still incorrect. Now we normalize on the entire dataset, which means that we also use test data for training a normalizer. This is incorrect.
You should 'normalize if needed' on the train set, as before, and then 'apply all transforms' to the test set.
In reply to: 208314510 [](ancestors = 208314510)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var dataEval = new RoleMappedData(scorer, testRoles.Schema.GetColumnRoleNames(), opt: true); | ||
var dict = eval.Evaluate(dataEval); | ||
var foldMetrics = BinaryClassificationMetrics.FromMetrics(env, dict["OverallMetrics"], dict["ConfusionMatrix"]); | ||
metrics.AddRange(foldMetrics); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AddRange [](start = 28, length = 8)
which ones will be added? Why AddRange? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of them.
AddRange, because dict[""] is IDataView which can have more than one row (not the case for this one)
In reply to: 208314718 [](ancestors = 208314718)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I assume this particular 'foldMetrics' will contain only one item? Maybe then turn it into .Add(foldMetrics.Single()) for self-documentability?
In reply to: 208355368 [](ancestors = 208355368,208314718)
var scoreRoles = new RoleMappedData(concat, label: "Label", feature: "Features"); | ||
IDataScorerTransform scorer = ScoreUtils.GetScorer(predictor, scoreRoles, env, trainRoles.Schema); | ||
|
||
// Cut of term transform from pipeline. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
of [](start = 23, length = 2)
of -> out #Closed
IDataScorerTransform scorer = ScoreUtils.GetScorer(predictor, scoreRoles, env, trainRoles.Schema); | ||
|
||
// Cut of term transform from pipeline. | ||
var new_scorer = ApplyTransformUtils.ApplyAllTransformsToData(env, scorer, loader, term); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new_scorer [](start = 20, length = 10)
newScorer #Closed
// Cut of term transform from pipeline. | ||
var new_scorer = ApplyTransformUtils.ApplyAllTransformsToData(env, scorer, loader, term); | ||
var keyToValue = new KeyToValueTransform(env, new_scorer, "PredictedLabel"); | ||
var model = env.CreatePredictionEngine<IrisData, IrisPrediction>(keyToValue); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CreatePredictionEngine [](start = 32, length = 22)
can we make sure that it works in case there's no label? One way to do so is to create a SchemaDefinition out of typeof(IrisData) and remove the column associated with Label. This way Label column will not be created, and it should not cause a crash #Closed
var saver = new BinarySaver(env, new BinarySaver.Arguments()); | ||
using (var ch = env.Start("SaveData")) | ||
using (var file = env.CreateOutputFile("i.idv")) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{ [](start = 16, length = 1)
delete #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var trainRoles = new RoleMappedData(binData, label: "Label", feature: "Features"); | ||
var trainer = new LinearClassificationTrainer(env, new LinearClassificationTrainer.Arguments | ||
{ | ||
NumThreads = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NumThreads [](start = 20, length = 10)
Are you sure this is fast enough? I seem to recall in other cases we raised ConvergenceThreshold as well #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NgramLength = 1, | ||
Column = new[] { new WordBagTransform.Column() { Name = "Tokenize", Source = new[] { "SentimentText" } } } | ||
}, | ||
loader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix spacing #Closed
{ | ||
var lteChild = tree.LteChild; | ||
var gteChild = tree.GtChild; | ||
//get Nodes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get Nodes; [](start = 22, length = 10)
// Get nodes. #Closed
var threshold = GetValue<string>(node.KeyValues, "Threshold").Split(new[] { ' ' }, 2)[1]; | ||
var nodeIndex = i; | ||
} | ||
// get leafs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get leafs [](start = 23, length = 9)
// Get leaves. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
|
||
|
||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove #Closed
dataEval = new RoleMappedData(newScorer, label: "Label", feature: "Features", opt: true); | ||
var new_evaluator = new BinaryClassifierMamlEvaluator(env, new BinaryClassifierMamlEvaluator.Arguments() { Threshold = 0.01f, UseRawScoreThreshold = false }); | ||
metricsDict = new_evaluator.Evaluate(dataEval); | ||
var new_metrics = BinaryClassificationMetrics.FromMetrics(env, metricsDict["OverallMetrics"], metricsDict["ConfusionMatrix"])[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new_metrics [](start = 20, length = 11)
should not use under_scores, use camelCase instead #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🕐
} | ||
|
||
|
||
public class IrisPrediction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra line :) #Closed
foreach (var input in testData.Take(20)) | ||
{ | ||
var prediction = model.Predict(input); | ||
Assert.True(prediction.PredictedLabel == "Iris-setosa"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iris-setosa [](start = 62, length = 11)
What, all of them? :) #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data file sorted by label, so first 50 is iris-setosa, then next 50 is iris-versicolor, and then last 50 is iris-virginica
In reply to: 208363615 [](ancestors = 208363615)
some other PR comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🕐
var scoreRoles = new RoleMappedData(trans, label: "Label", feature: "Features"); | ||
IDataScorerTransform scorer = ScoreUtils.GetScorer(predictor, scoreRoles, env, trainRoles.Schema); | ||
|
||
var dataEval = new RoleMappedData(scorer, label: "Label", feature: "Features", opt: true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var dataEval = new RoleMappedData(scorer, label: "Label", feature: "Features", opt: true); [](start = 16, length = 90)
It looks a bit more professional to evaluate on test data, and not on train :) #Closed
foreach (var input in testData.Take(20)) | ||
{ | ||
var prediction = model.Predict(input); | ||
Assert.True(prediction.PredictedLabel == "Iris-setosa"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PredictedLabel [](start = 43, length = 14)
and here you can also verify that prediction.PredictedLabel == input.Label
#Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look on him, first he want test data without label, and then compare it label with predicted one!
In reply to: 208767421 [](ancestors = 208767421)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only wanted test data without label in ONE test, not in ALL tests. The ONE test that specifically checks that label is not required :)
In reply to: 208767752 [](ancestors = 208767752,208767421)
using (var file = env.CreateOutputFile("model.zip")) | ||
TrainUtils.SaveModel(env, env.Start("saveChannel"), file, predictor, trainRoles); | ||
new SaveOnnxCommand(env, new SaveOnnxCommand.Arguments { InputModelFile = "model.zip", | ||
Json = "model.json", Onnx = "model.onnx" , Domain="dunno" }).Run(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Run [](start = 81, length = 3)
This is cheating, can't run commands here.
Let's take this example out of the PR with the goal to merge the others #Closed
@@ -15,6 +15,11 @@ | |||
|
|||
namespace Microsoft.ML.Runtime.FastTree | |||
{ | |||
[TlcModule.ComponentKind("FastTreeBinaryTrainer")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ComponentKind [](start = 15, length = 13)
Are you sure you need this as a component?.. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var lockEngine = new LockBasedPredictionEngine(env, model); | ||
Parallel.ForEach(testData, (input) => lockEngine.Predict(input)); | ||
int numThreads = 2; | ||
PredictionEngine<SentimentData, SentimentPrediction>[] models = new PredictionEngine<SentimentData, SentimentPrediction>[numThreads]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
useless #Resolved
} | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spaces #Resolved
|
||
|
||
/// <summary> | ||
/// This is a trivial implementation of a thread-safe prediction engine, where the underlying <see cref="SimplePredictionEngine"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<see cref="SimplePredictionEngine"/ [](start = 102, length = 35)
remove #Resolved
using (var ch = env.Start("saving")) | ||
TrainUtils.SaveModel(env, ch, file, predictor, scoreRoles); | ||
var threadEngine = new ThreadLocalBasedPredictionEngine(env, file); | ||
var poolEngine = new PoolBasedPredictionEngine(env, file); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
poolEngine [](start = 24, length = 10)
let's get rid of this guy, and only keep the thread-local
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why, they provide examples of how it can be handled in a different ways.
In reply to: 209756860 [](ancestors = 209756860)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I don't want any additional classes, just use the existing PredictionEngine. Which means that you just do Parallel.ForEach(data, x=> lock(lockObject) { engine.Predict(x); } );
In reply to: 209770093 [](ancestors = 209770093,209768067,209756860)
/// *) The tree ensemble learners, I should be able to inspect the trees. | ||
/// *) The LDA transform, I should be able to inspect the topics. | ||
/// I view it as essential from a usability perspective that this be discoverable to someone without | ||
/// having to read documentation.E.g.: if I have var lda = new LdaTransform().Fit(data)(I don't insist on that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fit [](start = 87, length = 3)
Do we already have Fit() implemented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is old Api examples, Fit is part of new Api
var previousLeafValue = GetValue<double>(node.KeyValues, "PreviousLeafValue"); | ||
var threshold = GetValue<string>(node.KeyValues, "Threshold").Split(new[] { ' ' }, 2)[1]; | ||
var nodeIndex = i; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are missing categorical split nodes. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took this code from Tree Visualizer, which I expect contains all required code. Also I'm not sure how to get them, I think I call all exposed methods in this example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tree visualizer contains code for categorical splits. Let me know if you need help.
In reply to: 209790640 [](ancestors = 209790640)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find any word "categ" in that project, are you sure it's not part of GetNode or GetLeaf logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FastTree.cs, line 3192.
public INode GetNode(int nodeId, bool isLeaf, IEnumerable<string> featuresNames = null)
{
var keyValues = new Dictionary<string, object>();
if (isLeaf)
{
keyValues.Add(NodeKeys.LeafValue, _regTree.LeafValue(nodeId));
}
else
{
if (featuresNames != null)
{
if (featuresNames is FeatureNameCollection features)
{
if (_regTree.CategoricalSplit[nodeId])
{
string featureList = string.Join(" OR \n",
_regTree.CategoricalSplitFeatures[nodeId].Select(feature => features[feature]));
keyValues.Add(NodeKeys.SplitName, featureList);
}
else
keyValues.Add(NodeKeys.SplitName, features[_regTree.SplitFeature(nodeId)]);
}
}
keyValues.Add(NodeKeys.Threshold, string.Format("<= {0}", _regTree.RawThreshold(nodeId)));
if (_regTree.SplitGains != null)
keyValues.Add(NodeKeys.SplitGain, _regTree.SplitGains[nodeId]);
if (_regTree.GainPValues != null)
keyValues.Add(NodeKeys.GainValue, _regTree.GainPValues[nodeId]);
if (_regTree.PreviousLeafValues != null)
keyValues.Add(NodeKeys.PreviousLeafValue, _regTree.PreviousLeafValues[nodeId]);
}
return new TreeNode(keyValues);
}
In reply to: 209791675 [](ancestors = 209791675)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, and I'm getting SplitName from node (node, not leaf) in my code, which is represent categorical split features or just this exact feature.
Are you want me to do something with this information? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a comprehensive guide to FastTree internal structure. I think the code looked OK as it is: it just demonstrates that it is possible to inspect the model after training.
In reply to: 209795941 [](ancestors = 209795941)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using (var cursor = trans.GetRowCursor(x => true)) | ||
{ | ||
Assert.True(cursor.Schema.TryGetColumnIndex("SentimentText", out int textColumn)); | ||
Assert.True(cursor.Schema.TryGetColumnIndex("Features_TransformedText", out int transformedTextColumn)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Features_TransformedText [](start = 65, length = 24)
I'm just trying to speak from a user who is new to ML.NET and I think they will have questions about this column name, how is one supposed to know such intermediate columns generated by the text transform? what are the other intermediate columns generated? what if I already had a column with this name before text transform was applied, will this hide my column or a suffix will be added. What if I had one column and I transformed it 'n' times and kept the dst column name the same then how will you look through all the intermediate transformation?
Please document all this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See code with comment above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, you are showing how to get column names, slight improvement in the example but lets focus on what you are advertising in the summary, you are saying "possibly through the debugger, be not such a pain to actually see what is happening to your data when you apply this or that transform" is this really possible with your example for cases where I have a column with name "FOO" and I apply two or more transforms to it that don't produce intermediate columns(and hide the src column) then how will I see what the first transform did? Either you show how this is done or change the documentation to reflect limitations with this example.
In reply to: 210415330 [](ancestors = 210415330)
{ | ||
var columnName = trans.Schema.GetColumnName(i); | ||
var columnType = trans.Schema.GetColumnType(i).RawType; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Little weird to just get column names and not to do anything with them, right? may be reuse them somehow at line 39-41? Make it look as if you are looking for a column name, then store it and document why you expect this column to be there in the first place, like Features_TransformedText is produced as intermediate column by the text transform. May be talk about suffix being added(or column hidding) if such a column name existed or even better add to that example.
Such examples and documentation will be very useful for people who are getting started with ML.NET.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
examples for #584