-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Docs 2nd pass for NaiveBayes, KMeans, OVA, Pairwise and OnnxTransformer #3387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
7fdb7d4
e3dbbc8
e397a11
94a5fcc
a0619bd
d13f693
017a852
c5b9f6b
fe35942
e58a32c
e12aaa0
5475499
8af3966
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
### Input and Output Columns | ||
The input features column data must be <xref:System.Single>. No label column needed. This trainer outputs the following columns: | ||
|
||
| Output Column Name | Column Type | Description| | ||
| -- | -- | -- | | ||
| `Score` | <xref:System.Single> | The unbounded score that was calculated by the trainer to determine the prediction.| | ||
| `PredictedLabel` | <xref:System.Int32> | The cluster id predicted by the trainer.| |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -26,7 +26,38 @@ | |
|
||
namespace Microsoft.ML.Trainers | ||
{ | ||
/// <include file='./doc.xml' path='doc/members/member[@name="KMeans++"]/*' /> | ||
/// <summary> | ||
/// The <see cref="IEstimator{TTransformer}"/> for training a KMeans clusterer | ||
/// </summary> | ||
/// <remarks> | ||
/// <format type="text/markdown"><. | ||
/// | ||
/// [!include[io](~/../docs/samples/docs/api-reference/io-columns-clustering.md)] | ||
/// | ||
/// ### Trainer Characteristics | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
input/output table is missing #Resolved |
||
/// | | | | ||
/// | -- | -- | | ||
/// | Machine learning task | Clustering | | ||
/// | Is normalization required? | Yes | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Are you sure? #Resolved There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yep. By default it uses default TrainerInfo() which ahs this default settings In reply to: 277113743 [](ancestors = 277113743) |
||
/// | Is caching required? | Yes | | ||
/// | Required NuGet in addition to Microsoft.ML | None | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
please check this. #Resolved There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
/// | ||
/// ### Training Algorithm Details | ||
/// K-means is a popular clustering algorithm. With K-means, the data is clustered into a specified | ||
/// number of clusters in order to minimize the within-cluster sum of squares. | ||
/// K-means++ improves upon K-means by using the [Yinyang K-Means](https://research.microsoft.com/apps/pubs/default.aspx?id=252149) | ||
/// method for choosing the initial cluster centers. | ||
/// K-Means++ accelerates K-Means up to an order of magnitude while producing exactly the same clustering results(modulo floating point precision issues). | ||
/// K-Means++ observes that there is a lot of redundancy across iterations in the KMeans algorithms and most points do not change their clusters during an iteration. | ||
/// It uses various bounding techniques to identify this redundancy and eliminate many distance computations and optimize centroid computations. | ||
/// For more information on K-means, and K-means++ see: | ||
/// [K-means](https://en.wikipedia.org/wiki/K-means_clustering) | ||
/// [K-means++](https://en.wikipedia.org/wiki/K-means%2b%2b) | ||
/// ]]> | ||
/// </format> | ||
/// </remarks> | ||
/// <seealso cref="Microsoft.ML.Trainers.KMeansTrainer" /> | ||
public class KMeansTrainer : TrainerEstimatorBase<ClusteringPredictionTransformer<KMeansModelParameters>, KMeansModelParameters> | ||
{ | ||
internal const string LoadNameValue = "KMeansPlusPlus"; | ||
|
@@ -50,6 +81,10 @@ internal static class Defaults | |
public const int NumberOfClusters = 5; | ||
} | ||
|
||
/// <summary> | ||
/// Options for the <see cref="KMeansTrainer"/> as used in | ||
/// [KMeansTrainer(Options)](xref:"Microsoft.ML.KMeansClusteringExtensions.KMeans(Microsoft.ML.ClusteringCatalog.ClusteringTrainers,Microsoft.ML.Trainers.KMeansTrainer.Options)". | ||
/// </summary> | ||
public sealed class Options : UnsupervisedTrainerInputBaseWithWeight | ||
{ | ||
/// <summary> | ||
|
This file was deleted.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -27,11 +27,38 @@ | |
namespace Microsoft.ML.Trainers | ||
{ | ||
/// <summary> | ||
/// Naive Bayes classifier is based on Bayes' theorem. It assumes independence among the presence of features | ||
/// in a class even though they may be dependent on each other. It is a multi-class trainer that accepts | ||
/// binary feature values of type float, i.e., feature values are either true or false, specifically a | ||
/// feature value greater than zero is treated as true. | ||
/// The <see cref="IEstimator{TTransformer}"/> for training a multiclass Naive Bayes predictor that supports binary feature values. | ||
/// </summary> | ||
/// <remarks> | ||
/// <format type="text/markdown"><). | ||
/// | ||
/// [!include[io](~/../docs/samples/docs/api-reference/io-columns-multiclass-classification.md)] | ||
/// | ||
/// ### Trainer Characteristics | ||
/// | | | | ||
/// | -- | -- | | ||
/// | Machine learning task | Multiclass classification | | ||
/// | Is normalization required? | Yes | | ||
/// | Is caching required? | No | | ||
/// | Required NuGet in addition to Microsoft.ML | None | | ||
/// | ||
/// ### Training Algorithm Details | ||
/// [Naive Bayes](https://en.wikipedia.org/wiki/Naive_Bayes_classifier) | ||
/// is a probabilistic classifier that can be used for multiclass problems. | ||
/// Using Bayes' theorem, the conditional probability for a sample belonging to a class | ||
/// can be calculated based on the sample count for each feature combination groups. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd recommend to add math equation to describe the scoring funciton. It's not easy to figure out how this modle is doing prediction given only text description. #ByDesign There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I ll need to read a K-Means++ paper to figure out that. I dont think users do care about exact formula. What they would care is that it is an improved version of K-Means and can be used as such (where K-Means is used) In reply to: 276525896 [](ancestors = 276525896) |
||
/// However, Naive Bayes Classifier is feasible only if the number of features and | ||
/// the values each feature can take is relatively small. | ||
/// It assumes independence among the presence of features in a class even though | ||
/// they may be dependent on each other. | ||
/// This multi-class trainer accepts "binary" feature values of type float: | ||
/// feature values that are greater than zero are treated as true and feature values | ||
/// that are less or equal to 0 are treated as false. | ||
/// ]]> | ||
/// </format> | ||
/// </remarks> | ||
/// <seealso cref="StandardTrainersCatalog.NaiveBayes(Microsoft.ML.MulticlassClassificationCatalog.MulticlassClassificationTrainers,System.String,System.String)"/> | ||
public sealed class NaiveBayesMulticlassTrainer : TrainerEstimatorBase<MulticlassPredictionTransformer<NaiveBayesMulticlassModelParameters>, NaiveBayesMulticlassModelParameters> | ||
{ | ||
internal const string LoadName = "MultiClassNaiveBayes"; | ||
|
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think scoring function to the trained modle should be explained? #WontFix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean Predict function? the one that predicts which cluster data point goes to?
In reply to: 276525272 [](ancestors = 276525272)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or do u mean how distance is calculated when algo converges to centroids?
In reply to: 277107516 [](ancestors = 277107516,276525272)
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In either case I dont think so. For former its not specific to KMeans++. For latter first how am I supposed to do that? Should I deduce this from the code? Secondly, I dont think this information is important to know for .NET devs. All they need to know that this is an improved version of KMeans and high level details of improvements that are described in wiki links
In reply to: 277107690 [](ancestors = 277107690,277107516,276525272)
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I did. Debugging ML model is debugging math equation. Without them, how can user improve their models? We keep saying that .NET devs can ignore details. However, this doesn't look super true in ML.NET's gitter channel.
Anyway, I am not blocking your PR. Just want to pass some of my observations. :) #Resolved
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, once we have more time we could prioritize such tasks to the top, currently this is lower pri to the rest of things that need to be done.
In reply to: 277148107 [](ancestors = 277148107)