Skip to content

Prior trainer should accept label column type of boolean ONLY. #3291

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 11, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -230,7 +230,7 @@ private PriorModelParameters Train(TrainContext context)
data.CheckBinaryLabel();
_host.CheckParam(data.Schema.Label.HasValue, nameof(data), "Missing Label column");
var labelCol = data.Schema.Label.Value;
_host.CheckParam(labelCol.Type == NumberDataViewType.Single, nameof(data), "Invalid type for Label column");
_host.CheckParam(labelCol.Type == BooleanDataViewType.Instance, nameof(data), "Invalid type for Label column");

double pos = 0;
double neg = 0;
Expand All @@ -243,9 +243,9 @@ private PriorModelParameters Train(TrainContext context)

using (var cursor = data.Data.GetRowCursor(cols))
{
var getLab = cursor.GetLabelFloatGetter(data);
var getLab = cursor.GetGetter<bool>(data.Schema.Label.Value);
var getWeight = colWeight >= 0 ? cursor.GetGetter<float>(data.Schema.Weight.Value) : null;
float lab = default;
bool lab = default;
float weight = 1;
while (cursor.MoveNext())
{
Expand All @@ -258,9 +258,9 @@ private PriorModelParameters Train(TrainContext context)
}

// Testing both directions effectively ignores NaNs.
if (lab > 0)
if (lab)
pos += weight;
else if (lab <= 0)
else
neg += weight;
}
}
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
maml.exe CV tr=PriorPredictor threads=- dout=%Output% data=%Data% seed=1
maml.exe CV tr=PriorPredictor threads=- dout=%Output% loader=Text{col=Label:BL:0 col=Features:~} data=%Data% seed=1
Not adding a normalizer.
Not training a calibrator because it is not needed.
Not adding a normalizer.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
PriorPredictor
AUC Accuracy Positive precision Positive recall Negative precision Negative recall Log-loss Log-loss reduction F1 Score AUPRC Learner Name Train Dataset Test Dataset Results File Run Time Physical Memory Virtual Memory Command Line Settings
0.5 0.656163 0 0 0.656163 1 0.935104 -0.00959 NaN 0.418968 PriorPredictor %Data% %Output% 99 0 0 maml.exe CV tr=PriorPredictor threads=- dout=%Output% data=%Data% seed=1
0.5 0.656163 0 0 0.656163 1 0.935104 -0.00959 NaN 0.418968 PriorPredictor %Data% %Output% 99 0 0 maml.exe CV tr=PriorPredictor threads=- dout=%Output% loader=Text{col=Label:BL:0 col=Features:~} data=%Data% seed=1

Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
maml.exe TrainTest test=%Data% tr=PriorPredictor dout=%Output% data=%Data% out=%Output% seed=1
maml.exe TrainTest test=%Data% tr=PriorPredictor dout=%Output% loader=Text{col=Label:BL:0 col=Features:~} data=%Data% out=%Output% seed=1
Not adding a normalizer.
Not training a calibrator because it is not needed.
TEST POSITIVE RATIO: 0.3448 (241.0/(241.0+458.0))
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
PriorPredictor
AUC Accuracy Positive precision Positive recall Negative precision Negative recall Log-loss Log-loss reduction F1 Score AUPRC Learner Name Train Dataset Test Dataset Results File Run Time Physical Memory Virtual Memory Command Line Settings
0.5 0.655222 0 0 0.655222 1 0.929318 0 NaN 0.415719 PriorPredictor %Data% %Data% %Output% 99 0 0 maml.exe TrainTest test=%Data% tr=PriorPredictor dout=%Output% data=%Data% out=%Output% seed=1
0.5 0.655222 0 0 0.655222 1 0.929318 0 NaN 0.415719 PriorPredictor %Data% %Data% %Output% 99 0 0 maml.exe TrainTest test=%Data% tr=PriorPredictor dout=%Output% loader=Text{col=Label:BL:0 col=Features:~} data=%Data% out=%Output% seed=1

Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
maml.exe CV tr=PriorPredictor threads=- dout=%Output% data=%Data% seed=1
maml.exe CV tr=PriorPredictor threads=- dout=%Output% loader=Text{col=Label:BL:0 col=Features:~} data=%Data% seed=1
Not adding a normalizer.
Not training a calibrator because it is not needed.
Not adding a normalizer.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
PriorPredictor
AUC Accuracy Positive precision Positive recall Negative precision Negative recall Log-loss Log-loss reduction F1 Score AUPRC Learner Name Train Dataset Test Dataset Results File Run Time Physical Memory Virtual Memory Command Line Settings
0.5 0.656163 0 0 0.656163 1 0.935104 -0.00959 NaN 0.418968 PriorPredictor %Data% %Output% 99 0 0 maml.exe CV tr=PriorPredictor threads=- dout=%Output% data=%Data% seed=1
0.5 0.656163 0 0 0.656163 1 0.935104 -0.00959 NaN 0.418968 PriorPredictor %Data% %Output% 99 0 0 maml.exe CV tr=PriorPredictor threads=- dout=%Output% loader=Text{col=Label:BL:0 col=Features:~} data=%Data% seed=1

Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
maml.exe TrainTest test=%Data% tr=PriorPredictor dout=%Output% data=%Data% out=%Output% seed=1
maml.exe TrainTest test=%Data% tr=PriorPredictor dout=%Output% loader=Text{col=Label:BL:0 col=Features:~} data=%Data% out=%Output% seed=1
Not adding a normalizer.
Not training a calibrator because it is not needed.
TEST POSITIVE RATIO: 0.3448 (241.0/(241.0+458.0))
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
PriorPredictor
AUC Accuracy Positive precision Positive recall Negative precision Negative recall Log-loss Log-loss reduction F1 Score AUPRC Learner Name Train Dataset Test Dataset Results File Run Time Physical Memory Virtual Memory Command Line Settings
0.5 0.655222 0 0 0.655222 1 0.929318 0 NaN 0.415719 PriorPredictor %Data% %Data% %Output% 99 0 0 maml.exe TrainTest test=%Data% tr=PriorPredictor dout=%Output% data=%Data% out=%Output% seed=1
0.5 0.655222 0 0 0.655222 1 0.929318 0 NaN 0.415719 PriorPredictor %Data% %Data% %Output% 99 0 0 maml.exe TrainTest test=%Data% tr=PriorPredictor dout=%Output% loader=Text{col=Label:BL:0 col=Features:~} data=%Data% out=%Output% seed=1

3 changes: 1 addition & 2 deletions test/Microsoft.ML.Predictor.Tests/TestPredictors.cs
Original file line number Diff line number Diff line change
Expand Up @@ -120,8 +120,7 @@ public void BinaryPriorTest()
{
var predictors = new[] {
TestLearners.binaryPrior};
var datasets = GetDatasetsForBinaryClassifierBaseTest();
RunAllTests(predictors, datasets);
RunAllTests(predictors, new[] { TestDatasets.breastCancerBoolLabel });
Done();
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ private IDataView GetBreastCancerDataviewWithTextColumns()
HasHeader = true,
Columns = new[]
{
new TextLoader.Column("Label", DataKind.Single, 0),
new TextLoader.Column("Label", DataKind.Boolean, 0),
new TextLoader.Column("F1", DataKind.String, 1),
new TextLoader.Column("F2", DataKind.Int32, 2),
new TextLoader.Column("Rest", DataKind.Single, new [] { new TextLoader.Range(3, 9) })
Expand Down