-
Notifications
You must be signed in to change notification settings - Fork 1.9k
No loader on APIs for ValueToKey/OneHotEncoding #2245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2245 +/- ##
==========================================
+ Coverage 69.78% 69.78% +<.01%
==========================================
Files 783 783
Lines 143438 143490 +52
Branches 16588 16590 +2
==========================================
+ Hits 100093 100137 +44
- Misses 38814 38818 +4
- Partials 4531 4535 +4
|
/// <param name="file">The path of the file containing the terms.</param> | ||
/// <param name="termsColumn"></param> | ||
/// <param name="loaderFactory"></param> | ||
/// <param name="termData">The data view containing the terms. If unspecified, they will be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth to specify what that dataview should have only one column and content of that column would be used to to specify allowed keys?
Can we also call it keyData?
I don't think we define Terms anywhere other than in maxNumTerms and even there in description we fallback to term keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. I also did not like the name, but didn't quite know exactly to call it. Fixed this here and I hope everywhere, and also made another few opportunistic name changes that I realized no longer applied. Also enhanced the documentation as requested.
/// <param name="termsColumn"></param> | ||
/// <param name="loaderFactory"></param> | ||
/// <param name="termData">The data view containing the terms. If unspecified, they will be | ||
/// determined from the input data upon fitting.</param> | ||
public static ValueToKeyMappingEstimator MapValueToKey(this TransformsCatalog.ConversionTransforms catalog, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ValueToKeyMappingEstimator [](start = 22, length = 26)
I've tried to look it up, but github doesn't allow you to smoothly explore file changes if file get renamed.
Why we even have this public constructor?
We already have ValueMapping which accepts IEnumerable for keys and ZeeshanA adding another constructor which accepts IDataView.
Considering all this restriction required on IDataView, don't you think it's easier to use ValueMapping transform?
It just a question, not a PR blocker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Ivan, I don't understand this question, could you clarify?
Are you asking why there is a public constructor on the estimators? I did not change the public/non-public nature of this constructor of the estimator. If you are pointing out that the estimator constructors should be non-public since we prefer things be created through MLContext
, I agree, and I believe there is already an issue tracking this, #2100. However, the issue I am addressing here is orthogonal to that. I agree it should be done. I see no one is assigned to it.
Also I am not sure what you are talking about w.r.t. another constructor. Certainly I see no other constructor here. I am trying to address the issue I see that whoever did this original estimator API decided, for whatever reason, that using component factories and IDataLoader
was a great idea, as discussed in the issue. The most obvious way to maintain the same sort of functionality (loading something from a file) is to provide the constructor I have provided. Are you disagreeing with that?
Also I did not rename the file?
But maybe I am misunderstanding your question. Perhaps you could clarify.
public ValueToKeyMappingEstimator(IHostEnvironment env, ValueToKeyMappingTransformer.ColumnInfo[] columns, | ||
string file = null, string termsColumn = null, | ||
IComponentFactory<IMultiStreamSource, IDataLoader> loaderFactory = null) | ||
public ValueToKeyMappingEstimator(IHostEnvironment env, ValueToKeyMappingTransformer.ColumnInfo[] columns, IDataView termData = null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ValueToKeyMappingTransformer.ColumnInfo[] columns [](start = 64, length = 49)
Can we put it termData before columns and then do params [] columns
?
Or maybe I can convince you to make all constructors with termData internal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#2100 already tracks the need to make these internal. I don't view that as relevant to the issue I am attempting to solve here. They both have to do with constructors, but I might prefer that we keep it clean, if at all possible.
public ValueToKeyMappingEstimator(IHostEnvironment env, ValueToKeyMappingTransformer.ColumnInfo[] columns, | ||
string file = null, string termsColumn = null, | ||
IComponentFactory<IMultiStreamSource, IDataLoader> loaderFactory = null) | ||
public ValueToKeyMappingEstimator(IHostEnvironment env, ValueToKeyMappingTransformer.ColumnInfo[] columns, IDataView termData = null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public [](start = 8, length = 6)
not on you, but this public constructor lucks documentation :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the idea is that these constructors will be made internal, per #2100.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixes #2231. We should work to scour this more thoroughly once the changes to allow non-public settings objects on
Arguments
/Settings
classes goes in.Also: I had thought another estimator (stopwords) did not have loader as part of its name as I had thought, but I still fixed the name.