Skip to content

Creation of components through MLContext and cleanup (Convert, DropSlots, FeatureSelection) #2365

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Feb 7, 2019

Conversation

artidoro
Copy link
Contributor

@artidoro artidoro commented Feb 1, 2019

This PR is part of the work outlined in #1798, and focuses on the Convert, DropSlots, CountFeatureSelection, MutualInformationFeatureSelection, FeatureContributionCalculationTransform, PermutationFeatureImportanceTransform, MissingValueDropping, TreeEnsembleFeaturizationTransform transformers/estimators:

  1. Internalize constructors of estimators and transformers
  2. Keep ColumnInfo and move it to to the estimators The ColumnInfo structure should live in the estimators, rather than transformers #1760
  3. Rename Arguments -> Options
  4. Internalize Options only when they are not used by public constructor. For all other cases, retain Options as public Arguments class should be made internal when possible #1758

@codecov
Copy link

codecov bot commented Feb 1, 2019

Codecov Report

Merging #2365 into master will decrease coverage by <.01%.
The diff coverage is 73.68%.

@@            Coverage Diff             @@
##           master    #2365      +/-   ##
==========================================
- Coverage   71.23%   71.22%   -0.01%     
==========================================
  Files         785      785              
  Lines      141034   141030       -4     
  Branches    16114    16116       +2     
==========================================
- Hits       100468   100454      -14     
- Misses      36095    36106      +11     
+ Partials     4471     4470       -1
Flag Coverage Δ
#Debug 71.22% <73.68%> (-0.01%) ⬇️
#production 67.57% <62.12%> (-0.02%) ⬇️
#test 85.31% <100%> (ø) ⬆️

@Ivanidzo4ka
Copy link
Contributor

Ivanidzo4ka commented Feb 1, 2019

    /// <param name="inputColumnName">Name of the column to transform. If set to <see langword="null"/>, the value of the <paramref name="outputColumnName"/> will be used as source.</param>        /// <param name="hashBits">Number of bits to hash into. Must be between 1 and 31, inclusive.</param>

can you fix it? #Closed


Refers to: src/Microsoft.ML.Data/Transforms/ConversionsExtensionsCatalog.cs:25 in bd98b16. [](commit_id = bd98b16, deletion_comment = False)

@Ivanidzo4ka
Copy link
Contributor

Ivanidzo4ka commented Feb 1, 2019

    public readonly bool Normalize;

can we comment them or hide from user? #Resolved


Refers to: src/Microsoft.ML.Data/Transforms/FeatureContributionCalculationTransform.cs:104 in bd98b16. [](commit_id = bd98b16, deletion_comment = False)

@Ivanidzo4ka
Copy link
Contributor

Ivanidzo4ka commented Feb 1, 2019

/// ConvertTransform allow to change underlying column type as long as we know how to convert types.

can you use #Resolved


Refers to: src/Microsoft.ML.Data/Transforms/TypeConverting.cs:56 in bd98b16. [](commit_id = bd98b16, deletion_comment = False)

/// <summary>
/// Returns the <see cref="SchemaShape"/> of the schema which will be produced by the transformer.
/// Used for schema propagation and verification in a pipeline.
/// </summary>
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Feb 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// ? #Closed

Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Feb 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please disregard my comments, apparently it doesn't work for intellisense. dotnet/csharplang#313


In reply to: 253150851 [](ancestors = 253150851)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we were talking about this with senja yesterday! it would be super useful!


In reply to: 253154622 [](ancestors = 253154622,253150851)

Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@@ -78,8 +78,8 @@ public void TestConvertWorkout()
var data = new[] { new TestClass() { A = 1, B = new int[2] { 1,4 } },
new TestClass() { A = 2, B = new int[2] { 3,4 } }};
var dataView = ML.Data.ReadFromEnumerable(data);
var pipe = new TypeConvertingEstimator(Env, columns: new[] {new TypeConvertingTransformer.ColumnInfo("ConvA", DataKind.R4, "A"),
new TypeConvertingTransformer.ColumnInfo("ConvB", DataKind.R4, "B")});
var pipe = new TypeConvertingEstimator(Env, columns: new[] {new TypeConvertingEstimator.ColumnInfo("ConvA", DataKind.R4, "A"),
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Feb 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TypeConvertingEstimator [](start = 27, length = 23)

TypeConvertingEstimator [](start = 27, length = 23)

why not ml.Catalog.ConvertType? #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! Let's do it :)


In reply to: 253204389 [](ancestors = 253204389)

@artidoro
Copy link
Contributor Author

artidoro commented Feb 1, 2019

    /// <param name="inputColumnName">Name of the column to transform. If set to <see langword="null"/>, the value of the <paramref name="outputColumnName"/> will be used as source.</param>        /// <param name="hashBits">Number of bits to hash into. Must be between 1 and 31, inclusive.</param>

Sorry I don't see what's wrong.


In reply to: 459817377 [](ancestors = 459817377)


Refers to: src/Microsoft.ML.Data/Transforms/ConversionsExtensionsCatalog.cs:25 in bd98b16. [](commit_id = bd98b16, deletion_comment = False)

@Ivanidzo4ka
Copy link
Contributor

    /// <param name="inputColumnName">Name of the column to transform. If set to <see langword="null"/>, the value of the <paramref name="outputColumnName"/> will be used as source.</param>        /// <param name="hashBits">Number of bits to hash into. Must be between 1 and 31, inclusive.</param>

It's a two <param> in one line. I'm just asking to press enter :)


In reply to: 459884561 [](ancestors = 459884561,459817377)


Refers to: src/Microsoft.ML.Data/Transforms/ConversionsExtensionsCatalog.cs:25 in bd98b16. [](commit_id = bd98b16, deletion_comment = False)

@artidoro
Copy link
Contributor Author

artidoro commented Feb 1, 2019

    /// <param name="inputColumnName">Name of the column to transform. If set to <see langword="null"/>, the value of the <paramref name="outputColumnName"/> will be used as source.</param>        /// <param name="hashBits">Number of bits to hash into. Must be between 1 and 31, inclusive.</param>

Ah I see! I did not see the rest of the line! :)


In reply to: 459885033 [](ancestors = 459885033,459884561,459817377)


Refers to: src/Microsoft.ML.Data/Transforms/ConversionsExtensionsCatalog.cs:25 in bd98b16. [](commit_id = bd98b16, deletion_comment = False)

@artidoro
Copy link
Contributor Author

artidoro commented Feb 1, 2019

    public readonly bool Normalize;

Ok, I hid them.


In reply to: 459818761 [](ancestors = 459818761)


Refers to: src/Microsoft.ML.Data/Transforms/FeatureContributionCalculationTransform.cs:104 in bd98b16. [](commit_id = bd98b16, deletion_comment = False)

@@ -312,7 +312,7 @@ public static CommonOutputs.TransformOutput PrepareRegressionLabel(IHostEnvironm
}
}
};
Copy link
Member

@abgoswam abgoswam Feb 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't see this being used . can this be deleted ? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes good point!


In reply to: 254010705 [](ancestors = 254010705)

@@ -46,6 +46,9 @@ public sealed class Arguments : TransformInputBase

internal static string RegistrationName = "CountFeatureSelectionTransform";

/// <summary>
Copy link
Member

@abgoswam abgoswam Feb 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to the corresponding Transformer ? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are moving them to the estimators right? I think it should stay here


In reply to: 254011201 [](ancestors = 254011201)

@@ -168,37 +171,11 @@ private static VersionInfo GetVersionInfo()

private const string RegistrationName = "Convert";

public IReadOnlyCollection<ColumnInfo> Columns => _columns.AsReadOnly();
public IReadOnlyCollection<TypeConvertingEstimator.ColumnInfo> Columns => _columns.AsReadOnly();
Copy link
Member

@sfilipi sfilipi Feb 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Columns [](start = 71, length = 7)

xml #Resolved

@sfilipi
Copy link
Member

sfilipi commented Feb 6, 2019

        public readonly long MinCount;

xml #Resolved


Refers to: src/Microsoft.ML.Transforms/CountFeatureSelection.cs:56 in 72e4b49. [](commit_id = 72e4b49, deletion_comment = False)

Copy link
Member

@sfilipi sfilipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@artidoro artidoro merged commit b2851e5 into dotnet:master Feb 7, 2019
@artidoro artidoro deleted the round3 branch March 13, 2019 17:55
@ghost ghost locked as resolved and limited conversation to collaborators Mar 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
API Issues pertaining the friendly API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants