-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Creation of components through MLContext and cleanup (text related transforms) #2393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creation of components through MLContext and cleanup (text related transforms) #2393
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2393 +/- ##
==========================================
+ Coverage 71.26% 71.26% +<.01%
==========================================
Files 785 785
Lines 140946 140939 -7
Branches 16108 16108
==========================================
Hits 100440 100440
+ Misses 36039 36031 -8
- Partials 4467 4468 +1
|
{ | ||
Contracts.CheckValue(env, nameof(env)); | ||
_host = env.Register(nameof(LatentDirichletAllocationEstimator)); | ||
_columns = columns.ToImmutableArray(); | ||
} | ||
|
||
public sealed class ColumnInfo | ||
{ | ||
public readonly string Name; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public [](start = 12, length = 6)
Could you add a summary tag for each one of these public entries?
And could you add a summary for the ColumnInfo object? #Resolved
…malizingEstimator
this one is currently used by Static API via the OnFit() delegate, to provide details about the topics discovered by LightLDA. making it internal would break the Static API In reply to: 460088862 [](ancestors = 460088862) Refers to: src/Microsoft.ML.Transforms/Text/LdaTransform.cs:167 in 92e6f2d. [](commit_id = 92e6f2d, deletion_comment = False) |
|
@@ -94,7 +94,7 @@ internal bool TryUnparse(StringBuilder sb) | |||
/// </summary> | |||
public sealed class TokenizeColumn : OneToOneColumn { } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public [](start = 8, length = 6)
also internal #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -220,7 +220,7 @@ internal bool TryUnparse(StringBuilder sb) | |||
|
|||
/// <summary> | |||
/// This class is a merger of <see cref="ValueToKeyMappingTransformer.Options"/> and | |||
/// <see cref="NgramExtractingTransformer.Arguments"/>, with the allLength option removed. | |||
/// <see cref="NgramExtractingTransformer.Options"/>, with the allLength option removed. | |||
/// </summary> | |||
public abstract class ArgumentsBase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public [](start = 8, length = 6)
internal? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you merge with master, I think I tackle it recently. #Resolved Refers to: src/Microsoft.ML.Transforms/Text/WordBagTransform.cs:369 in a81c3d2. [](commit_id = a81c3d2, deletion_comment = False) |
public readonly string Name; | ||
public readonly string InputColumnName; | ||
|
||
public ColumnInfo(string name, string inputColumnName = null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
summary to constructor. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
InputColumnName = inputColumnName ?? name; | ||
Separators = separators ?? new[] { ' ' }; | ||
} | ||
} | ||
|
||
public override SchemaShape GetOutputSchema(SchemaShape inputSchema) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
///
/// Returns the of the schema which will be produced by the transformer.
/// Used for schema propagation and verification in a pipeline.
///
#Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will create a separate issue to scrub all the entrypoint APIs and fix their namings. If you see they derive from In reply to: 460829905 [](ancestors = 460829905) Refers to: src/Microsoft.ML.Transforms/Text/WordBagTransform.cs:249 in aae84eb. [](commit_id = aae84eb, deletion_comment = False) |
fixed. looks like u had missed it. ;) In reply to: 460788737 [](ancestors = 460788737) Refers to: src/Microsoft.ML.Transforms/Text/WordBagTransform.cs:369 in a81c3d2. [](commit_id = a81c3d2, deletion_comment = False) |
@@ -42,7 +42,7 @@ internal sealed class ExtractorColumn : ManyToOneColumn | |||
|
|||
internal static class WordBagBuildingTransformer | |||
{ | |||
public sealed class Column : ManyToOneColumn | |||
internal sealed class Column : ManyToOneColumn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
internal [](start = 8, length = 8)
it's already part of internal class, you no longer need to make it subclass internal :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// </summary> | ||
public readonly string Name; | ||
/// <summary> | ||
/// Name of column to transform. If set to <see langword="null"/>, the value of the <cref see="Name"/> will be used as source. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Name of column to transform. If set to , the value of the will be used as source. [](start = 11, length = 127)
as Zeeshan A pointed in other PRs, you need to left only first sentence. It's always populated with value. (Only if it's property in ColumnInfo)
I would suggest to sweep whole PR. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not quite sure what u mean .
scrub whole PR for what ?
In reply to: 254093004 [](ancestors = 254093004)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you add anywhere else that comment to InputColumnName inside ColumnInfo, you need to change it.
If you didn't add it in other places, you just need to fix this one.
In reply to: 254094414 [](ancestors = 254094414,254093004)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it should just say
/// <summary>
/// Name of column to transform.
/// </summary>
?
In reply to: 254094755 [](ancestors = 254094755,254094414,254093004)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's how I fixed it! #Resolved
Well, it's part of internal class now, so I decide it's not worth time to change methods modifiers :) In reply to: 460846657 [](ancestors = 460846657,460788737) Refers to: src/Microsoft.ML.Transforms/Text/WordBagTransform.cs:369 in a81c3d2. [](commit_id = a81c3d2, deletion_comment = False) |
@@ -473,7 +473,7 @@ private void TextFeaturizationOn(string dataPath) | |||
BagOfTrichar: r.Message.TokenizeIntoCharacters().ToNgrams(ngramLength: 3, weighting: NgramExtractingEstimator.WeightingCriteria.TfIdf), | |||
|
|||
// NLP pipeline 4: word embeddings. | |||
Embeddings: r.Message.NormalizeText().TokenizeText().WordEmbeddings(WordEmbeddingsExtractingTransformer.PretrainedModelKind.GloVeTwitter25D) | |||
Embeddings: r.Message.NormalizeText().TokenizeText().WordEmbeddings(WordEmbeddingsExtractingEstimator.PretrainedModelKind.GloVeTwitter25D) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NormalizeText [](start = 42, length = 13)
If you update any of cookbooks you also need to update https://github.com/dotnet/machinelearning/blob/master/docs/code/MlNetCookBook.md
I wish we had way to autogenerate it somehow... #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Towards #1798 , #1758, #1760
The following transform estimators are being addressed:
The changes are as follows :
Arguments
->Options
Creation of components through MLContext: advanced options and other feedback #1798Options
when they are not used by public constructor. Arguments class should be made internal when possible #1758Options
objects asoptions
(instead ofargs
oradvancedSettings
used so far) Creation of components through MLContext: advanced options and other feedback #1798