-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Add LDA example to Microsoft.ML.Samples #1782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
abgoswam
commented
Nov 29, 2018
- Adding a LDA example to Microsoft.ML.Samples . This was a pending comment on Convert LdaTransform to IEstimator/ITransformer API #1410
var transformed_data = transformer.Transform(trainData); | ||
|
||
// Small helper to print the text inside the columns, in the console. | ||
Action<string, IEnumerable<VBuffer<float>>> printHelper = (columnName, column) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
printHelper [](start = 56, length = 11)
you don't have to do this, if you'll use it only once. #Resolved
// A pipeline for featurizing the "Review" column | ||
string ldaFeatures = "LdaFeatures"; | ||
var pipeline = ml.Transforms.Text.ProduceWordBags("Review"). | ||
Append(ml.Transforms.Text.LatentDirichletAllocation("Review", ldaFeatures, numTopic:3)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LatentDirichletAllocation [](start = 42, length = 25)
just asking: besides numTopic, is there other params that the user might tune often? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i doubt users tune other parameters besides numTopic
In reply to: 237660759 [](ancestors = 237660759)
maybe add the same sample one more time, and adapt to use this one, so this gets its example as well? #Pending Refers to: src/Microsoft.ML.Transforms/Text/TextCatalog.cs:540 in cbd8ad3. [](commit_id = cbd8ad3, deletion_comment = False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
// A pipeline for featurizing the "Review" column | ||
string ldaFeatures = "LdaFeatures"; | ||
var pipeline = ml.Transforms.Text.ProduceWordBags("Review"). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Review" [](start = 62, length = 8)
nameof(SamplesUtils.DatasetUtils.SampleTopicsData.Review) ? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
none of the examples within Microsoft.Ml.Samples showcase this paradigm .. Is it to keep things simple ? I am skipping this for now. In reply to: 442996256 [](ancestors = 442996256) Refers to: src/Microsoft.ML.Transforms/Text/TextCatalog.cs:540 in cbd8ad3. [](commit_id = cbd8ad3, deletion_comment = False) |