-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Creation of components through MLContext and cleanup (text transform) #2394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creation of components through MLContext and cleanup (text transform) #2394
Conversation
@@ -120,7 +120,7 @@ public sealed class Arguments : TransformInputBase | |||
public TextNormKind VectorNormalizer = TextNormKind.L2; | |||
} | |||
|
|||
public sealed class Settings | |||
public sealed class Options |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a summary tag for each of these settings, and for the Options class? You can take them from the constructor I believe. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -79,7 +79,7 @@ internal bool TryUnparse(StringBuilder sb) | |||
/// <summary> | |||
/// This class exposes <see cref="NgramExtractorTransform"/>/<see cref="NgramHashExtractingTransformer"/> arguments. | |||
/// </summary> | |||
public sealed class Arguments : TransformInputBase | |||
internal sealed class Arguments : TransformInputBase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arguments [](start = 30, length = 9)
Can you rename to Options? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Codecov Report
@@ Coverage Diff @@
## master #2394 +/- ##
==========================================
- Coverage 71.22% 71.22% -0.01%
==========================================
Files 785 785
Lines 141030 140978 -52
Branches 16116 16113 -3
==========================================
- Hits 100455 100412 -43
+ Misses 36106 36095 -11
- Partials 4469 4471 +2
|
Your PR is also related to #2026. I think you might fix the issue as part of this change. |
@@ -136,10 +136,10 @@ public sealed class Settings | |||
#pragma warning restore MSML_NoInstanceInitializers // No initializers on instance fields or properties | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole thing requires proper cleaning/documentation. Do you want to do it in this PR, or you prefer leave it for later cleaning? #Pending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current PR is to fix the public API
We are doing some cleaning/doc fixes on an opportunistic basis. But we should have a separate issue / PRs for "proper" cleaning and documentation
In reply to: 254013725 [](ancestors = 254013725)
|
||
/// <summary> | ||
/// Transform several text columns into featurized float array that represents counts of ngrams and char-grams. | ||
/// </summary> | ||
/// <param name="catalog">The text-related transform's catalog.</param> | ||
/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnNames"/>.</param> | ||
/// <param name="inputColumnNames">Name of the columns to transform. If set to <see langword="null"/>, the value of the <paramref name="outputColumnName"/> will be used as source.</param> | ||
/// <param name="advancedSettings">Advanced transform settings</param> | ||
/// <param name="options">Advanced transform settings.</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Advanced transform settings. [](start = 7, length = 63)
@sfilipi any chance we have a consensus regarding what to put here?
I can see /// <param name="options">Advanced arguments to the algorithm.</param>
/// <param name="options">Algorithm advanced options.</param>
/// <param name="options">Algorithm advanced settings.</param>
/// <param name="options">Advanced options to the algorithm.</param>
#Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the object's name is now Options, let;s go with:
/// Advanced options to the algorithm.
if you can search/replace in one of the PR that would be awesome!
In reply to: 254018114 [](ancestors = 254018114)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…s://github.com/abgoswam/machinelearning into abgoswam/transform_estimator_api_texttransform
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
/// <summary> | ||
/// Transform several text columns into featurized float array that represents counts of ngrams and char-grams. | ||
/// </summary> | ||
/// <param name="catalog">The text-related transform's catalog.</param> | ||
/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnNames"/>.</param> | ||
/// <param name="inputColumnNames">Name of the columns to transform. If set to <see langword="null"/>, the value of the <paramref name="outputColumnName"/> will be used as source.</param> | ||
/// <param name="advancedSettings">Advanced transform settings</param> | ||
/// <param name="options">Advanced transform settings.</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
settings [](start = 53, length = 8)
Let's not call the Options "settings" :) #Resolved
|
||
/// <summary> | ||
/// Transform several text columns into featurized float array that represents counts of ngrams and char-grams. | ||
/// </summary> | ||
/// <param name="catalog">The text-related transform's catalog.</param> | ||
/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnNames"/>.</param> | ||
/// <param name="inputColumnNames">Name of the columns to transform. If set to <see langword="null"/>, the value of the <paramref name="outputColumnName"/> will be used as source.</param> | ||
/// <param name="advancedSettings">Advanced transform settings</param> | ||
/// <param name="options">Advanced transform settings.</param> | ||
public static TextFeaturizingEstimator FeaturizeText(this TransformsCatalog.TextTransforms catalog, | ||
string outputColumnName, | ||
IEnumerable<string> inputColumnNames, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IEnumerable inputColumnNames [](start = 12, length = 36)
I know it is not your change, but can we go with params, rather than an IEnumerable?
Looking at the sample, having to construct a List is overkill. #Pending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
/// <summary> | ||
/// Transform several text columns into featurized float array that represents counts of ngrams and char-grams. | ||
/// </summary> | ||
/// <param name="catalog">The text-related transform's catalog.</param> | ||
/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnNames"/>.</param> | ||
/// <param name="inputColumnNames">Name of the columns to transform. If set to <see langword="null"/>, the value of the <paramref name="outputColumnName"/> will be used as source.</param> | ||
/// <param name="advancedSettings">Advanced transform settings</param> | ||
/// <param name="options">Advanced transform settings.</param> | ||
public static TextFeaturizingEstimator FeaturizeText(this TransformsCatalog.TextTransforms catalog, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FeaturizeText [](start = 47, length = 13)
link the sample to this one, since it illustrates the usage of Options. #Resolved
s.KeepNumbers = false; | ||
s.OutputTokens = true; | ||
s.TextLanguage = TextFeaturizingEstimator.Language.English; // supports English, French, German, Dutch, Italian, Spanish, Japanese | ||
var customized_pipeline = ml.Transforms.Text.FeaturizeText(customizedColumnName, new List<string> { "SentimentText" }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new List { "SentimentText" } [](start = 93, length = 36)
umm :) #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// <summary> | ||
/// Advanced settings for the <see cref="TextFeaturizingEstimator"/>. | ||
/// </summary> | ||
public sealed class Options |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Options [](start = 28, length = 7)
can you remove this, and use the Arguments above, like for everything else?
The class below is redundant.
make the Factories internal, so they are not visible. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't want to do that.
Main reason why it's bad - is bunch of Factories we don't want to expose.
In reply to: 254358363 [](ancestors = 254358363)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps we should do some of this cleanup outside purview of this PR ...
I need to understand properly how the factories are used etc.
In reply to: 254461580 [](ancestors = 254461580,254358363)
UseWordExtractor = false, | ||
}).Fit(loader).Transform(loader); | ||
|
||
var trans = mlContext.Transforms.Text.ExtractWordEmbeddings("Features", "WordEmbeddings_TransformedText", | ||
WordEmbeddingsExtractingTransformer.PretrainedModelKind.Sswe).Fit(text).Transform(text); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WordEmbeddingsExtractingTransformer [](start = 16, length = 35)
I though you move it to estimator in your other PR, can you merge with master? #Closed
…s://github.com/abgoswam/machinelearning into abgoswam/transform_estimator_api_texttransform
/// <example> | ||
/// <format type="text/markdown"> | ||
/// <] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TextTransform [](start = 101, length = 13)
you removing this example becaue it doesn't use this exact method? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeap. removed it from here, and added to the API below (this was one of Senja's comments)
In reply to: 254802756 [](ancestors = 254802756)
@@ -120,26 +120,59 @@ internal sealed class Arguments : TransformInputBase | |||
public TextNormKind VectorNormalizer = TextNormKind.L2; | |||
} | |||
|
|||
public sealed class Settings | |||
/// <summary> | |||
/// Advanced settings for the <see cref="TextFeaturizingEstimator"/>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
settings [](start = 21, length = 8)
can we call it options? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
am planning to have another PR which fixes this for all the learners/featurizers we converted to Options
In reply to: 254802978 [](ancestors = 254802978)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left two small comments, can you address them?
Otherwise looks good!
Towards #1798 , #1758
The following transform estimators are being addressed:
NOTE:
The changes are as follows :
public
extension methods, one for simple arguments and the other for advanced optionsOptions
objects as arguments instead ofAction
delegateSettings
toOptions
Options
objects as options (instead of args or advancedSettings used so far)Arguments
since the public constructor usesOptions
. Also a few other fields have been madeinternal