Skip to content

Creation of components through MLContext and cleanup (OneHotHash, Hash, CopyCol, KeyToVector) #2364

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Feb 5, 2019

Conversation

artidoro
Copy link
Contributor

@artidoro artidoro commented Feb 1, 2019

This PR is part of the work outlined in #1798, and focuses on the OneHotHash, Hash, CopyCol, KeyToVector transformers/estimators:

  1. Internalize constructors of estimators and transformers
  2. Keep ColumnInfo and move it to to the estimators The ColumnInfo structure should live in the estimators, rather than transformers #1760
  3. Rename Arguments -> Options
  4. Internalize Options only when they are not used by public constructor. For all other cases, retain Options as public Arguments class should be made internal when possible #1758

@artidoro artidoro self-assigned this Feb 1, 2019
@artidoro artidoro added the API Issues pertaining the friendly API label Feb 1, 2019
@artidoro artidoro changed the title WIP: Creation of components through MLContext and cleanup (OneHotHash, Hash, CopyCol, KeyToVector) Creation of components through MLContext and cleanup (OneHotHash, Hash, CopyCol, KeyToVector) Feb 2, 2019
@artidoro
Copy link
Contributor Author

artidoro commented Feb 3, 2019

I have updated this PR with the feedback I received on the other PRs and have fixed it, so now it should be ready for review. #Resolved

@codecov
Copy link

codecov bot commented Feb 3, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@c544d51). Click here to learn what that means.
The diff coverage is 87.85%.

@@            Coverage Diff            @@
##             master    #2364   +/-   ##
=========================================
  Coverage          ?   71.25%           
=========================================
  Files             ?      785           
  Lines             ?   140785           
  Branches          ?    16088           
=========================================
  Hits              ?   100320           
  Misses            ?    36011           
  Partials          ?     4454
Flag Coverage Δ
#Debug 71.25% <87.85%> (?)
#production 67.61% <83.67%> (?)
#test 85.3% <97.61%> (?)

@Ivanidzo4ka
Copy link
Contributor

Ivanidzo4ka commented Feb 4, 2019

    /// </summary>

Change it to same description as you have for class #Resolved


Refers to: src/Microsoft.ML.Data/Transforms/Hashing.cs:1232 in 3a752e3. [](commit_id = 3a752e3, deletion_comment = False)

@Ivanidzo4ka
Copy link
Contributor

Ivanidzo4ka commented Feb 4, 2019

    public sealed class ColumnInfo

I'm surprised you haven't shuffle this class around, and didn't take care of public constructors. #Resolved


Refers to: src/Microsoft.ML.Transforms/KeyToVectorMapping.cs:44 in 3a752e3. [](commit_id = 3a752e3, deletion_comment = False)

@artidoro
Copy link
Contributor Author

artidoro commented Feb 4, 2019

    public sealed class ColumnInfo

It's coming in #2367 :) I only did KeyToVector in this PR, not KeyToBinaryVector.


In reply to: 460432337 [](ancestors = 460432337)


Refers to: src/Microsoft.ML.Transforms/KeyToVectorMapping.cs:44 in 5d8d49e. [](commit_id = 5d8d49e, deletion_comment = False)

@artidoro
Copy link
Contributor Author

artidoro commented Feb 4, 2019

    /// </summary>

Will do!


In reply to: 460430445 [](ancestors = 460430445)


Refers to: src/Microsoft.ML.Data/Transforms/Hashing.cs:1232 in 5d8d49e. [](commit_id = 5d8d49e, deletion_comment = False)

public readonly int HashBits;
/// <summary>
/// Hashing seed.
/// </summary>
Copy link
Member

@sfilipi sfilipi Feb 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can keep those in one line. ///

Hashing seed. #Resolved

@sfilipi
Copy link
Member

sfilipi commented Feb 5, 2019

    /// Initializes a new instance of <see cref="HashingEstimator"/>.

i actually think this is fine for ctor comment; it seems like most .net ctor comments. For more info, you have already linked to the HashingEstimator, where there is more information available. #Resolved


Refers to: src/Microsoft.ML.Data/Transforms/Hashing.cs:1231 in 3a752e3. [](commit_id = 3a752e3, deletion_comment = False)

/// </summary>
public readonly string Name;
/// <summary>
/// Name of column to transform. If set to <see langword="null"/>, the value of the <cref see="Name"/> will be used as source.
Copy link
Member

@sfilipi sfilipi Feb 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If set to , the value of the will be used as source. [](start = 45, length = 93)

this does not apply here. #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good point! Thank you :)


In reply to: 253750601 [](ancestors = 253750601)

public IRowToRowMapper GetRowToRowMapper(Schema inputSchema) => _transformer.GetRowToRowMapper(inputSchema);
}

/// <summary>
/// Estimator which takes set of columns and produce for each column indicator array. Use hashing to determine indicator position.
/// Estimator that produces a column of indicator vectors. The mapping between a value and a corresponding index is done through hashing.
Copy link
Member

@sfilipi sfilipi Feb 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Estimator [](start = 8, length = 9)

#Resolved

Copy link
Member

@sfilipi sfilipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@Ivanidzo4ka
Copy link
Contributor

    /// </summary>

What a lie....


Refers to: src/Microsoft.ML.Transforms/OneHotHashEncoding.cs:259 in 3a752e3. [](commit_id = 3a752e3, deletion_comment = False)

Copy link
Member

@abgoswam abgoswam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@abgoswam
Copy link
Member

abgoswam commented Feb 5, 2019

    public override SchemaShape GetOutputSchema(SchemaShape inputSchema)

nit: summary comment #Resolved


Refers to: src/Microsoft.ML.Transforms/KeyToVectorMapping.cs:477 in 3a752e3. [](commit_id = 3a752e3, deletion_comment = False)

@artidoro
Copy link
Contributor Author

artidoro commented Feb 5, 2019

    public override SchemaShape GetOutputSchema(SchemaShape inputSchema)

Will take care in my other PR. Here I don't go over KeyToVectorMapping closely. But thanks for looking at this.


In reply to: 460774208 [](ancestors = 460774208)


Refers to: src/Microsoft.ML.Transforms/KeyToVectorMapping.cs:477 in 3a752e3. [](commit_id = 3a752e3, deletion_comment = False)

@artidoro artidoro merged commit 617f7f6 into dotnet:master Feb 5, 2019
@artidoro artidoro deleted the hashcopyvec branch March 13, 2019 17:55
@ghost ghost locked as resolved and limited conversation to collaborators Mar 25, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
API Issues pertaining the friendly API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants