Skip to content

No loader on APIs for ValueToKey/OneHotEncoding #2245

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jan 27, 2019

Conversation

TomFinley
Copy link
Contributor

@TomFinley TomFinley commented Jan 25, 2019

Fixes #2231. We should work to scour this more thoroughly once the changes to allow non-public settings objects on Arguments/Settings classes goes in.

Also: I had thought another estimator (stopwords) did not have loader as part of its name as I had thought, but I still fixed the name.

@TomFinley TomFinley added the API Issues pertaining the friendly API label Jan 25, 2019
@TomFinley TomFinley self-assigned this Jan 25, 2019
@codecov
Copy link

codecov bot commented Jan 25, 2019

Codecov Report

Merging #2245 into master will increase coverage by <.01%.
The diff coverage is 82.4%.

@@            Coverage Diff             @@
##           master    #2245      +/-   ##
==========================================
+ Coverage   69.78%   69.78%   +<.01%     
==========================================
  Files         783      783              
  Lines      143438   143490      +52     
  Branches    16588    16590       +2     
==========================================
+ Hits       100093   100137      +44     
- Misses      38814    38818       +4     
- Partials     4531     4535       +4
Flag Coverage Δ
#Debug 69.78% <82.4%> (ø) ⬆️
#production 66.11% <73.61%> (-0.01%) ⬇️
#test 84.94% <100%> (+0.01%) ⬆️

/// <param name="file">The path of the file containing the terms.</param>
/// <param name="termsColumn"></param>
/// <param name="loaderFactory"></param>
/// <param name="termData">The data view containing the terms. If unspecified, they will be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth to specify what that dataview should have only one column and content of that column would be used to to specify allowed keys?
Can we also call it keyData?
I don't think we define Terms anywhere other than in maxNumTerms and even there in description we fallback to term keys.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I also did not like the name, but didn't quite know exactly to call it. Fixed this here and I hope everywhere, and also made another few opportunistic name changes that I realized no longer applied. Also enhanced the documentation as requested.

/// <param name="termsColumn"></param>
/// <param name="loaderFactory"></param>
/// <param name="termData">The data view containing the terms. If unspecified, they will be
/// determined from the input data upon fitting.</param>
public static ValueToKeyMappingEstimator MapValueToKey(this TransformsCatalog.ConversionTransforms catalog,
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Jan 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ValueToKeyMappingEstimator [](start = 22, length = 26)

I've tried to look it up, but github doesn't allow you to smoothly explore file changes if file get renamed.
Why we even have this public constructor?
We already have ValueMapping which accepts IEnumerable for keys and ZeeshanA adding another constructor which accepts IDataView.
Considering all this restriction required on IDataView, don't you think it's easier to use ValueMapping transform?

It just a question, not a PR blocker.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Ivan, I don't understand this question, could you clarify?

Are you asking why there is a public constructor on the estimators? I did not change the public/non-public nature of this constructor of the estimator. If you are pointing out that the estimator constructors should be non-public since we prefer things be created through MLContext, I agree, and I believe there is already an issue tracking this, #2100. However, the issue I am addressing here is orthogonal to that. I agree it should be done. I see no one is assigned to it.

Also I am not sure what you are talking about w.r.t. another constructor. Certainly I see no other constructor here. I am trying to address the issue I see that whoever did this original estimator API decided, for whatever reason, that using component factories and IDataLoader was a great idea, as discussed in the issue. The most obvious way to maintain the same sort of functionality (loading something from a file) is to provide the constructor I have provided. Are you disagreeing with that?

Also I did not rename the file?

But maybe I am misunderstanding your question. Perhaps you could clarify.

public ValueToKeyMappingEstimator(IHostEnvironment env, ValueToKeyMappingTransformer.ColumnInfo[] columns,
string file = null, string termsColumn = null,
IComponentFactory<IMultiStreamSource, IDataLoader> loaderFactory = null)
public ValueToKeyMappingEstimator(IHostEnvironment env, ValueToKeyMappingTransformer.ColumnInfo[] columns, IDataView termData = null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ValueToKeyMappingTransformer.ColumnInfo[] columns [](start = 64, length = 49)

Can we put it termData before columns and then do params [] columns?
Or maybe I can convince you to make all constructors with termData internal?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#2100 already tracks the need to make these internal. I don't view that as relevant to the issue I am attempting to solve here. They both have to do with constructors, but I might prefer that we keep it clean, if at all possible.

public ValueToKeyMappingEstimator(IHostEnvironment env, ValueToKeyMappingTransformer.ColumnInfo[] columns,
string file = null, string termsColumn = null,
IComponentFactory<IMultiStreamSource, IDataLoader> loaderFactory = null)
public ValueToKeyMappingEstimator(IHostEnvironment env, ValueToKeyMappingTransformer.ColumnInfo[] columns, IDataView termData = null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public [](start = 8, length = 6)

not on you, but this public constructor lucks documentation :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the idea is that these constructors will be made internal, per #2100.

Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Member

@sfilipi sfilipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@TomFinley TomFinley merged commit 2f0c475 into dotnet:master Jan 27, 2019
@TomFinley TomFinley deleted the NoLoad branch February 11, 2019 20:21
@ghost ghost locked as resolved and limited conversation to collaborators Mar 25, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
API Issues pertaining the friendly API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove IDataLoader from public API surfaces for specifying data
3 participants