-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Lda snapping to template #3442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lda snapping to template #3442
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -46,7 +46,9 @@ namespace Microsoft.ML.Transforms.Text | |
// | ||
// See <a href="https://github.com/dotnet/machinelearning/blob/master/test/Microsoft.ML.TestFramework/DataPipe/TestDataPipe.cs"/> | ||
// for an example on how to use LatentDirichletAllocationTransformer. | ||
/// <include file='doc.xml' path='doc/members/member[@name="LightLDA"]/*' /> | ||
/// <summary> | ||
/// <see cref="ITransformer"/> resulting from fitting a <see cref="LatentDirichletAllocationEstimator"/>. | ||
/// </summary> | ||
public sealed class LatentDirichletAllocationTransformer : OneToOneTransformerBase | ||
{ | ||
internal sealed class Options : TransformInputBase | ||
|
@@ -936,7 +938,56 @@ private protected override IRowMapper MakeRowMapper(DataViewSchema schema) | |
=> new Mapper(this, schema); | ||
} | ||
|
||
/// <include file='doc.xml' path='doc/members/member[@name="LightLDA"]/*' /> | ||
/// <summary> | ||
/// The LDA transform implements <a href="https://arxiv.org/abs/1412.1576">LightLDA</a>, a state-of-the-art implementation of Latent Dirichlet Allocation. | ||
/// </summary> | ||
/// <remarks> | ||
/// <format type="text/markdown">< algorithm that infers semantic structure from text data, | ||
/// and ultimately helps answer the question on "what is this document about?". | ||
/// It can be used to featurize any text fields as low-dimensional topical vectors. | ||
/// LightLDA is an extremely efficient implementation of LDA that incorporates a number of | ||
/// optimization techniques. | ||
/// With the LDA transform, ML.NET users can train a topic model to produce 1 million topics with 1 million words vocabulary | ||
/// on a 1-billion-token document set one a single machine in a few hours(typically, LDA at this scale takes days and requires large clusters). | ||
/// The most significant innovation is a super-efficient $O(1)$. [Metropolis-Hastings sampling algorithm](https://en.wikipedia.org/wiki/Metropolis–Hastings_algorithm), | ||
/// whose running cost is agnostic of model size, allowing it to converges nearly an order of magnitude faster than other [Gibbs samplers](https://en.wikipedia.org/wiki/Gibbs_sampling). | ||
/// | ||
/// In an ML.NET pipeline, this estimator requires the output of some preprocessing, as its input. | ||
/// A typical pipeline operating on text would require text normalization, tokenization and producing n-grams to supply to the LDA estimator. | ||
/// See the example usage in the See Also section for usage suggestions. | ||
/// | ||
/// If we have the following three examples of text, as data points, and use the LDA transform with the number of topics set to 3, | ||
/// we would get the results displayed in the table below. Example documents: | ||
/// * I like to eat bananas. | ||
/// * I eat bananas everyday. | ||
/// * First celebrated in 1970, Earth Day now includes events in more than 193 countries, | ||
/// which are now coordinated globally by the Earth Day Network. | ||
/// | ||
/// Notice the similarity in values of the first and second row, compared to the third, | ||
/// and see how those values are indicative of similarities between those two (small) bodies of text. | ||
/// | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Context is missing.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The illustration read fine to me. I got that the first two were related to the topic of bananas and the other wasn't. #Resolved There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
/// | Topic1 | Topic2 | Topic 3 | | ||
/// | ------- | ------- | ------- | | ||
/// | 0.5714 | 0.0000 | 0.4286 | | ||
/// | 0.5714 | 0.0000 | 0.4286 | | ||
/// | 0.2400 | 0.3200 | 0.4400 | | ||
/// | ||
/// For more technical details you can consult the following papers. | ||
/// * [LightLDA: Big Topic Models on Modest Computer Clusters](https://arxiv.org/abs/1412.1576) | ||
/// * [LightLDA](https://github.com/Microsoft/LightLDA) | ||
/// | ||
/// ]]></format> | ||
/// </remarks> | ||
/// <seealso cref="TextCatalog.LatentDirichletAllocation(TransformsCatalog.TextTransforms, string, string, int, float, float, int, int, int, int, int, int, int, bool)"/> | ||
public sealed class LatentDirichletAllocationEstimator : IEstimator<LatentDirichletAllocationTransformer> | ||
{ | ||
[BestFriend] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are those sentences required? You provide some input to the very beginning of this transform and then switch to algorithm details. I feel there might be a missing bridge between them.
Also, the descriptions of training algorithm should be put into one single place. This section is somehow repleating information described
above
.The
above
means:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used the sentence as an example text, but i am realizing it is confusing. Let me try to find somethign unrelated.
In reply to: 277141226 [](ancestors = 277141226)