-
Notifications
You must be signed in to change notification settings - Fork 1.9k
More internalization of non-public Microsoft.ML.Data #2453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
5 4 4 5 7 10 3 2 0 0.566368 -0.112074189 0.169326738 -0.5519618 0.517083168 -0.2568235 | ||
3 1 1 1 2 2 3 1 0 0.497458071 -0.293033749 0.419352561 -0.39683342 0.572824 -0.07215268 | ||
6 8 8 1 3 4 3 7 0 -0.382303447 -0.432640046 0.289225727 -0.4996817 0.571184337 -0.08415316 | ||
5 1 1 1 2 1 3 1 0 -0.157029659 -0.555585265 0.490177631 -0.305056125 0.35670203 -0.453979075 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that because RFF as its name suggests uses PRNG extensively, and the random initialization behavior differs between ConsoleEnvironment
and MLContext
, that this is why this baseline changed, in case anyone was curious.
@@ -62,13 +63,13 @@ ColumnType GetType(Schema schema, string name) | |||
t = GetType(data.Schema, "f4"); | |||
Assert.True(t is VectorType vt4 && vt4.ItemType == NumberType.R4 && vt4.Size == 0); | |||
|
|||
data = ColumnSelectingTransformer.CreateKeep(Env, data, new[] { "f1", "f2", "f3", "f4" }); | |||
data = ML.Transforms.SelectColumns("f1", "f2", "f3", "f4").Fit(data).Transform(data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want users to manipulate evaluators? Refers to: src/Microsoft.ML.Data/Evaluators/BinaryClassifierEvaluator.cs:871 in 5f44e9b. [](commit_id = 5f44e9b, deletion_comment = False) |
I would think that the same comment about internalization of the class or of the fields applies here. #Resolved Refers to: src/Microsoft.ML.Data/Evaluators/ClusteringEvaluator.cs:565 in 5f44e9b. [](commit_id = 5f44e9b, deletion_comment = False) |
And here as well, same comment about internalization of the class or of the fields. #Resolved Refers to: src/Microsoft.ML.Data/Evaluators/MultiClassClassifierEvaluator.cs:538 in 5f44e9b. [](commit_id = 5f44e9b, deletion_comment = False) |
@@ -25,7 +23,7 @@ public WordEmbeddingsTests(ITestOutputHelper helper) | |||
public void TestWordEmbeddings() | |||
{ | |||
var dataPath = GetDataPath(TestDatasets.Sentiment.trainFilename); | |||
var data = new TextLoader(Env, | |||
var data = new TextLoader(ML, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new TextLoader [](start = 23, length = 14)
you don't seem to like ML.Data.CreateTextLoader :) #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I'm not sure. I haven't tried to use it much yet. So I might like it. I'm not trying to make the tests so that they all call the public surface yet (don't have time to do that all right now), just the things relevant to the stuff that I was deleting. Then of course I saw this DataSaverUtils.SaveDataView
and that just ... well, that had to go. :) Of course I couldn't quite make it all go due to a flaw our API being incomplete (see #2452), but I could at least make a dent in it. Plus I get to start using MLContext
more than ConsoleEnvironment
, which strikes me as good.
In reply to: 254556100 [](ancestors = 254556100)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here as well, if not the class, we could internalize some of the fields/constructors. #Resolved Refers to: src/Microsoft.ML.Data/Evaluators/MultiOutputRegressionEvaluator.cs:374 in 5f44e9b. [](commit_id = 5f44e9b, deletion_comment = False) |
And here as well, is it possible to internalize the class? Otherwise the fields should be cleaned up a bit. #Resolved Refers to: src/Microsoft.ML.Data/Evaluators/QuantileRegressionEvaluator.cs:261 in 5f44e9b. [](commit_id = 5f44e9b, deletion_comment = False) |
@@ -164,10 +161,8 @@ public abstract class SingleFeaturePredictionTransformerBase<TModel, TScorer> : | |||
/// </summary> | |||
public ColumnType FeatureColumnType { get; } | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not making this internal? #Pending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean the class? I think this has to remain public, but I could be wrong. This is publishing additional information about what feature columns it is consuming, and it seems like people find that sort of information valueable on transformers (see, e.g., #2454, where the input column name is considered valuable information). Of course, there's a question of whether we actually want our transformers to publish that for v1. That I'm less sure about. So maybe we do want to internalize this. I don't know. But I wasn't planning on rocking the boat on that yet, just internalizing the stuff I know must be hidden.
In reply to: 254560344 [](ancestors = 254560344,254558228)
@@ -55,7 +53,8 @@ public abstract class PredictionTransformerBase<TModel, TScorer> : IPredictionTr | |||
|
|||
public bool IsRowToRowMapper => true; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a line of XML? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this internal too? I think these should never be instantiated directly from the constructors right? #Resolved Refers to: src/Microsoft.ML.Data/Scorers/PredictionTransformer.cs:327 in 26089f2. [](commit_id = 26089f2, deletion_comment = False) |
Well, In reply to: 461298134 [](ancestors = 461298134,461298043) Refers to: src/Microsoft.ML.Data/Evaluators/BinaryClassifierEvaluator.cs:871 in 5f44e9b. [](commit_id = 5f44e9b, deletion_comment = False) |
Right, yes we should ... I think I did in some other places but somehow wasn't consistent with it. While doing this I also realized In reply to: 461301953 [](ancestors = 461301953) Refers to: src/Microsoft.ML.Data/Scorers/PredictionTransformer.cs:327 in 26089f2. [](commit_id = 26089f2, deletion_comment = False) |
Rankers are a little different in that they work collectively over multiple examples per query. So the convenience class could not be used in this case. In reply to: 461299201 [](ancestors = 461299201) Refers to: src/Microsoft.ML.Data/Evaluators/RankerEvaluator.cs:543 in 5f44e9b. [](commit_id = 5f44e9b, deletion_comment = False) |
Hi @artidoro I think I resolved your comments, so if you could take a look to confirm that would be great, thanks! |
Codecov Report
@@ Coverage Diff @@
## master #2453 +/- ##
==========================================
- Coverage 71.22% 71.22% -0.01%
==========================================
Files 785 785
Lines 141030 140977 -53
Branches 16116 16116
==========================================
- Hits 100455 100408 -47
+ Misses 36106 36098 -8
- Partials 4469 4471 +2
|
Thank you for taking care of them Tom! |
Yet another step along #1602, where much miscellaneous internal infrastructure in Microsoft.ML.Data gets hidden, or even in some cases removed. I think this round removed about 600 or so public members or thereabouts. (Though TBH about 500 of that was just due to
IEvaluator
implementations alone).Some of the commits are very small (e.g., internalize vector utils is more or less what it sounds like), whereas others are more involved.
Crucially, prediction transformers are no longer generic on the scorers, and all scorers are internal. This is related to #1995, since scorers were
IDataTransformer
implementors, so this is another place where we're removing things related to this deprecated interface from our public surface.I also took the opportunity to do some opportunistic changing of the tests to use more of the public surface area, as opposed to the non-public APIs, especially around text writing. Incidentally filed #2452 as a side effect of that work.