-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Fixes #591: typos, adding the type attribute to lists, and moving the name attribute for some examples #592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -28,7 +28,7 @@ | |
</summary> | ||
<remarks> | ||
The TextToKeyConverter transform builds up term vocabularies (dictionaries). | ||
The TextToKey Converter and the <see cref="T:Microsoft.ML.Transforms.HashConverter"/> are the two one primary mechanisms by which raw input is transformed into keys. | ||
The TextToKeyConverter and the <see cref="T:Microsoft.ML.Transforms.HashConverter"/> are the two one primary mechanisms by which raw input is transformed into keys. | ||
If multiple columns are used, each column builds/uses exactly one vocabulary. | ||
The output columns are KeyType-valued. | ||
The Key value is the one-based index of the item in the dictionary. | ||
|
@@ -49,6 +49,52 @@ | |
</code> | ||
</example> | ||
</example> | ||
|
||
<member name="NAHandle"> | ||
<summary> | ||
Handle missing values by replacing them with either the default value or the indicated value. | ||
</summary> | ||
<remarks> | ||
This transform handles missing values in the input columns. For each input column, it creates an output column | ||
where the missing values are replaced by one of these specified values: | ||
<list type='bullet'> | ||
<item> | ||
<description>The default value of the appropriate type.</description> | ||
</item> | ||
<item> | ||
<description>The mean value of the appropriate type.</description> | ||
</item> | ||
<item> | ||
<description>The max value of the appropriate type.</description> | ||
</item> | ||
<item> | ||
<description>The min value of the appropriate type.</description> | ||
</item> | ||
</list> | ||
<para>The last three work only for numeric/TimeSpan/DateTime kind columns.</para> | ||
<para> | ||
The output column can also optionally include an indicator vector for which slots were missing in the input column. | ||
This can be done only when the indicator vector type can be converted to the input column type, i.e. only for numeric columns. | ||
</para> | ||
<para> | ||
When computing the mean/max/min value, there is also an option to compute it over the whole column instead of per slot. | ||
This option has a default value of true for variable length vectors, and false for known length vectors. | ||
It can be changed to true for known length vectors, but it results in an error if changed to false for variable length vectors. | ||
</para> | ||
</remarks> | ||
<seealso cref=" Microsoft.ML.Runtime.Data.MetadataUtils.Kinds.HasMissingValues"/> | ||
<seealso cref="T:Microsoft.ML.Data.DataKind"/> | ||
</member> | ||
<example name="NAHandle"> | ||
<example> | ||
<code language="csharp"> | ||
pipeline.Add(new MissingValueHandler("FeatureCol", "CleanFeatureCol") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ❔ Why is the |
||
{ | ||
ReplaceWith = NAHandleTransformReplacementKind.Mean | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 💡 Should likely be indented |
||
}); | ||
</code> | ||
</example> | ||
</example> | ||
|
||
</members> | ||
</doc> |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -95,7 +95,7 @@ | |
<para>Generally, ensemble models provide better coverage and accuracy than single decision trees. | ||
Each tree in a decision forest outputs a Gaussian distribution.</para> | ||
<para>For more see: </para> | ||
<list> | ||
<list type='bullet'> | ||
<item><description><a href='http://en.wikipedia.org/wiki/Random_forest'>Wikipedia: Random forest</a></description></item> | ||
<item><description><a href='http://jmlr.org/papers/volume7/meinshausen06a/meinshausen06a.pdf'>Quantile regression forest</a></description></item> | ||
<item><description><a href='https://blogs.technet.microsoft.com/machinelearning/2014/09/10/from-stumps-to-trees-to-forests/'>From Stumps to Trees to Forests</a></description></item> | ||
|
@@ -146,7 +146,7 @@ | |
<summary> | ||
Trains a tree ensemble, or loads it from a file, then maps a numeric feature vector | ||
to three outputs: | ||
<list> | ||
<list type='number'> | ||
<item><description>A vector containing the individual tree outputs of the tree ensemble.</description></item> | ||
<item><description>A vector indicating the leaves that the feature vector falls on in the tree ensemble.</description></item> | ||
<item><description>A vector indicating the paths that the feature vector falls on in the tree ensemble.</description></item> | ||
|
@@ -157,28 +157,28 @@ | |
</summary> | ||
<remarks> | ||
In machine learning it is a pretty common and powerful approach to utilize the already trained model in the process of defining features. | ||
<para>One such example would be the use of model's scores as features to downstream models. For example, we might run clustering on the original features, | ||
<para>One such example would be the use of model's scores as features to downstream models. For example, we might run clustering on the original features, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ❔ Why is this required? It's awkward to read and seems like it would be easy to regress. |
||
and use the cluster distances as the new feature set. | ||
Instead of consuming the model's output, we could go deeper, and extract the 'intermediate outputs' that are used to produce the final score. </para> | ||
Instead of consuming the model's output, we could go deeper, and extract the 'intermediate outputs' that are used to produce the final score. </para> | ||
There are a number of famous or popular examples of this technique: | ||
<list> | ||
<item><description>A deep neural net trained on the ImageNet dataset, with the last layer removed, is commonly used to compute the 'projection' of the image into the 'semantic feature space'. | ||
It is observed that the Euclidean distance in this space often correlates with the 'semantic similarity': that is, all pictures of pizza are located close together, | ||
<list type='bullet'> | ||
<item><description>A deep neural net trained on the ImageNet dataset, with the last layer removed, is commonly used to compute the 'projection' of the image into the 'semantic feature space'. | ||
It is observed that the Euclidean distance in this space often correlates with the 'semantic similarity': that is, all pictures of pizza are located close together, | ||
and far away from pictures of kittens. </description></item> | ||
<item><description>A matrix factorization and/or LDA model is also often used to extract the 'latent topics' or 'latent features' associated with users and items.</description></item> | ||
<item><description>The weights of the linear model are often used as a crude indicator of 'feature importance'. At the very minimum, the 0-weight features are not needed by the model, | ||
and there's no reason to compute them. </description></item> | ||
<item><description>A matrix factorization and/or LDA model is also often used to extract the 'latent topics' or 'latent features' associated with users and items.</description></item> | ||
<item><description>The weights of the linear model are often used as a crude indicator of 'feature importance'. At the very minimum, the 0-weight features are not needed by the model, | ||
and there's no reason to compute them. </description></item> | ||
</list> | ||
<para>Tree featurizer uses the decision tree ensembles for feature engineering in the same fashion as above.</para> | ||
<para>Let's assume that we've built a tree ensemble of 100 trees with 100 leaves each (it doesn't matter whether boosting was used or not in training). | ||
<para>Let's assume that we've built a tree ensemble of 100 trees with 100 leaves each (it doesn't matter whether boosting was used or not in training). | ||
If we associate each leaf of each tree with a sequential integer, we can, for every incoming example x, | ||
produce an indicator vector L(x), where Li(x) = 1 if the example x 'falls' into the leaf #i, and 0 otherwise.</para> | ||
produce an indicator vector L(x), where Li(x) = 1 if the example x 'falls' into the leaf #i, and 0 otherwise.</para> | ||
<para>Thus, for every example x, we produce a 10000-valued vector L, with exactly 100 1s and the rest zeroes. | ||
This 'leaf indicator' vector can be considered the ensemble-induced 'footprint' of the example.</para> | ||
<para>The 'distance' between two examples in the L-space is actually a Hamming distance, and is equal to the number of trees that do not distinguish the two examples.</para> | ||
This 'leaf indicator' vector can be considered the ensemble-induced 'footprint' of the example.</para> | ||
<para>The 'distance' between two examples in the L-space is actually a Hamming distance, and is equal to the number of trees that do not distinguish the two examples.</para> | ||
<para>We could repeat the same thought process for the non-leaf, or internal, nodes of the trees (we know that each tree has exactly 99 of them in our 100-leaf example), | ||
and produce another indicator vector, N (size 9900), for each example, indicating the 'trajectory' of each example through each of the trees.</para> | ||
<para>The distance in the combined 19900-dimensional LN-space will be equal to the number of 'decisions' in all trees that 'agree' on the given pair of examples.</para> | ||
and produce another indicator vector, N (size 9900), for each example, indicating the 'trajectory' of each example through each of the trees.</para> | ||
<para>The distance in the combined 19900-dimensional LN-space will be equal to the number of 'decisions' in all trees that 'agree' on the given pair of examples.</para> | ||
<para>The TreeLeafFeaturizer is also producing the third vector, T, which is defined as Ti(x) = output of tree #i on example x.</para> | ||
</remarks> | ||
<example> | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -13,8 +13,8 @@ | |
and an option to update the weight vector using the average of the vectors seen over time (averaged argument is set to True by default). | ||
</remarks> | ||
</member> | ||
<example> | ||
<example name="OGD"> | ||
<example name="OGD"> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
is this not an error to have nested examples? #Pending There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yea I see it's used everywhere. I wonder why though? In reply to: 205926076 [](ancestors = 205926076) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it is legal in xml. I needed a named wrapper to get the node, and ...just called it example. this is how they are being included: I don't need to include them with the /* notation, i can just include the named node itself, but i wasn't sure if the docs tools would tolerate the name attribute on it. In my TODO list. In reply to: 205926101 [](ancestors = 205926101,205926076) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
out of curiosity, is order of having name / not having name matter? In AP example you have and here you have #Pending There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it depends how you include the node in the code. see the long answer to Pete's comment. Should clarify it. In reply to: 205926135 [](ancestors = 205926135) |
||
<example> | ||
<code language="csharp"> | ||
new OnlineGradientDescentRegressor | ||
{ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❔ Is this missing a
T:
prefix? It's unclear why the others are given in full form but I noticed this one is different.