-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Towards #3204 - Conversion's catalog #3394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3394 +/- ##
==========================================
+ Coverage 72.69% 72.73% +0.03%
==========================================
Files 807 807
Lines 145172 145206 +34
Branches 16225 16230 +5
==========================================
+ Hits 105533 105615 +82
+ Misses 35223 35174 -49
- Partials 4416 4417 +1
|
/// <param name="inputColumnName">Name of the column to transform. If set to <see langword="null"/>, the value of the <paramref name="outputColumnName"/> will be used as source.</param> | ||
/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnName"/>. | ||
/// This column's data type will be a vector of <see cref="System.UInt32"/>, or a <see cref="System.UInt32"/> based on whether the input column data types | ||
/// are vectors or scalars.</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scalars [](start = 27, length = 7)
scalars or just primitives? @eerhardt @TomFinley
/// <param name="columns">Description of dataset columns and how to process them.</param> | ||
/// <remarks>This transform can operate over several columns.</remarks> | ||
/// <param name="catalog">The transform's catalog.</param> | ||
/// <param name="columns">The pairs of input and output columns. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
airs of [](start = 39, length = 7)
remove #Resolved
/// <param name="columns">The pairs of input and output columns. | ||
/// This estimator operates over text, numeric, boolean and <see cref="DataViewRowId"/> data types. | ||
/// The new column's data type will be a vector of <see cref="System.UInt32"/>, or a <see cref="System.UInt32"/> based on whether the input column data types | ||
/// were vectors or scalars.</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
were [](start = 12, length = 4)
are #Resolved
/// <param name="columns">Description of dataset columns and how to process them.</param> | ||
/// <remarks>This transform can operate over several columns.</remarks> | ||
/// <param name="catalog">The transform's catalog.</param> | ||
/// <param name="columns">The pairs of input and output columns. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pairs of input and output columns. [](start = 34, length = 38)
The input and output columns. #Resolved
[BestFriend] | ||
internal static HashingEstimator Hash(this TransformsCatalog.ConversionTransforms catalog, params HashingEstimator.ColumnOptions[] columns) | ||
=> new HashingEstimator(CatalogUtils.GetEnvironment(catalog), columns); | ||
|
||
/// <summary> | ||
/// Changes column type of the input column. | ||
/// Create a <see cref="TypeConvertingEstimator"/>, which convertes the type of the data to the type specified in <paramref name="outputKind"/>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
convertes [](start = 66, length = 9)
converts #Resolved
@@ -114,10 +134,15 @@ public static KeyToValueMappingEstimator MapKeyToValue(this TransformsCatalog.Co | |||
=> new KeyToValueMappingEstimator(CatalogUtils.GetEnvironment(catalog), outputColumnName, inputColumnName); | |||
|
|||
/// <summary> | |||
/// Convert the key types back to their original values. | |||
/// Create a <see cref="KeyToValueMappingEstimator"/>, which convert the key types back to their original values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
convert [](start = 69, length = 7)
converts #Resolved
@@ -133,21 +158,24 @@ public static KeyToValueMappingEstimator MapKeyToValue(this TransformsCatalog.Co | |||
} | |||
|
|||
/// <summary> | |||
/// Maps key types or key values into a floating point vector. | |||
/// Create a <see cref="KeyToVectorMappingEstimator"/>, which maps key types or key values into a floating point vector. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which maps key types or key values into a floating point vector. [](start = 64, length = 64)
which maps the value of into a floating point vector representing the value. #Resolved
@@ -326,15 +370,17 @@ public static KeyToValueMappingEstimator MapKeyToValue(this TransformsCatalog.Co | |||
} | |||
|
|||
/// <summary> | |||
/// <see cref="ValueMappingEstimator"/> | |||
/// Create a <see cref="ValueMappingEstimator"/>, which onverts value types into <see cref="KeyDataViewType"/>, loading the keys to use from <paramref name="keyValuePairs"/>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
onverts [](start = 64, length = 7)
converts #Resolved
/// <remarks>This transform can operate over several columns.</remarks> | ||
/// <param name="catalog">The transform's catalog.</param> | ||
/// <param name="columns">The pairs of input and output columns. | ||
/// This estimator operates over text, numeric, boolean and <see cref="DataViewRowId"/> data types. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
over text, numeric, boolean and data types [](start = 36, length = 70)
add Key #Resolved
/// | | | | ||
/// | -- | -- | | ||
/// | Does this estimator need to look at the data to train its parameters? | No | | ||
/// | Input column data type | KeyDataViewType | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
KeyDataViewType [](start = 35, length = 15)
xref #Closed
/// | | | | ||
/// | -- | -- | | ||
/// | Does this estimator need to look at the data to train its parameters? | Yes, if the mapping of the hashes to the values is required. | | ||
/// | Input column data type | Vector or primitive of text or [KeyDataViewType](xref:Microsoft.ML.Data.KeyDataViewType) data types.| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or primitive of text or KeyDataViewType data typ [](start = 41, length = 91)
add all the other types #Resolved
/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnName"/>.</param> | ||
/// <param name="inputColumnName">Name of the column to transform. If set to <see langword="null"/>, the value of the <paramref name="outputColumnName"/> will be used as source.</param> | ||
/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnName"/>. | ||
/// This column's data type will be a vector of <see cref="System.UInt32"/>, or a <see cref="System.UInt32"/> based on whether the input column data types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[](start = 56, length = 27)
Is it more useful if we say that these are keys? As in KeyDataViewType
? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
think it is more correct to leave it to ints. You're right on them being effectively keys though, but as far as the underlying data type, it is UInt32
In reply to: 276808872 [](ancestors = 276808872)
/// are vectors or scalars.</param> | ||
/// <param name="inputColumnName">Name of the column whose data will be hashed. | ||
/// If set to <see langword="null"/>, the value of the <paramref name="outputColumnName"/> will be used as source. | ||
/// This estimator operates over text, numeric, boolean, <see cref="KeyDataViewType"/> or <see cref="DataViewRowId"/> data types. </param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[](start = 65, length = 29)
just 'key' or 'key type' #Pending
/// are vectors or scalars.</param> | ||
/// <param name="inputColumnName">Name of the column whose data will be hashed. | ||
/// If set to <see langword="null"/>, the value of the <paramref name="outputColumnName"/> will be used as source. | ||
/// This estimator operates over text, numeric, boolean, <see cref="KeyDataViewType"/> or <see cref="DataViewRowId"/> data types. </param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DataViewRowId [](start = 109, length = 13)
do we need to mention DataViewRowId? is it group id? we should ask Tom about a phrase description for it.
#Pending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is group id, it is the idv row id, the Uint128
In reply to: 276842426 [](ancestors = 276842426)
[BestFriend] | ||
internal static HashingEstimator Hash(this TransformsCatalog.ConversionTransforms catalog, params HashingEstimator.ColumnOptions[] columns) | ||
=> new HashingEstimator(CatalogUtils.GetEnvironment(catalog), columns); | ||
|
||
/// <summary> | ||
/// Changes column type of the input column. | ||
/// Create a <see cref="TypeConvertingEstimator"/>, which converts the type of the data to the type specified in <paramref name="outputKind"/>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the type of the data [](start = 75, length = 20)
the input column's data type #Resolved
/// </summary> | ||
/// <param name="catalog">The conversion transform's catalog.</param> | ||
/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnName"/>.</param> | ||
/// <param name="inputColumnName">Name of the column to transform. If set to <see langword="null"/>, the value of the <paramref name="outputColumnName"/> will be used as source.</param> | ||
/// <param name="inputColumnName">Name of the column to transform. If set to <see langword="null"/>, the value of the <paramref name="outputColumnName"/> will be used as source. | ||
/// This transform operates over numeric, boolean, text, <see cref="System.DateTime"/> and <see cref="KeyDataViewType"/> data types.</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
KeyDataViewType [](start = 110, length = 15)
key #Resolved
/// | | | | ||
/// | -- | -- | | ||
/// | Does this estimator need to look at the data to train its parameters? | Yes, if the mapping of the hashes to the values is required. | | ||
/// | Input column data type | Vector or scalars of numeric, boolean, text, [DateTime](xref: System.DateTime) and [KeyDataViewType](xref:Microsoft.ML.Data.KeyDataViewType) data types.| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[DateTime](xref: System.DateTime) [](start = 80, length = 33)
you can just use automatic links: <xref: System.DateTime> to save yourself from having to write the anchor text #Resolved
@@ -502,6 +499,24 @@ public override JToken SavePfa(BoundPfaContext ctx, JToken srcToken) | |||
} | |||
} | |||
|
|||
/// <summary> | |||
/// Utilizes KeyValues <see cref="AnnotationInfo"/> of the input column, to map keys to the corresponding values. | |||
/// Maps zero values of the <see cref="KeyDataViewType"/> are mapped to the <see langword="default"/> value of the output type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maps zero values of the are mapped [](start = 8, length = 64)
re-read: Maps .... are mapped to
#Closed
/// | | | | ||
/// | -- | -- | | ||
/// | Does this estimator need to look at the data to train its parameters? | No | | ||
/// | Input column data type | [KeyDataViewType](xref:Microsoft.ML.Data.KeyDataViewType) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
KeyDataViewType [](start = 36, length = 15)
so here we can use 'key type' as the anchor text. we're using raw types in the params requirement instead of IDV types. in this table we have a change to link to the actual class, which explains what we mean by 'key type'.
we're doing the same for text.
text
#Pending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about numeric.. if we keep it to that, we'd have to spell out: float, double, int, long, uint, short, ushort.. doesn't seem feasible. How about i leave it to no references for text and numeric?
In reply to: 276847010 [](ancestors = 276847010)
@@ -1104,9 +1102,23 @@ public override void Process() | |||
} | |||
|
|||
/// <summary> | |||
/// Estimator for <see cref="HashingTransformer"/> which can hash either single valued columns or vector columns. For vector columns, | |||
/// it hashes each slot separately. It can hash either text values or key values. | |||
/// Estimator for <see cref="HashingTransformer"/>. hash either single valued columns or vector columns. For vector columns, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
h -> H #Resolved
/// <include file='doc.xml' path='doc/members/member[@name="ValueMappingEstimator"]/*' /> | ||
|
||
/// <summary> | ||
/// Estimator for <see cref="ValueMappingTransformer"/>. It is a 1-1 mapping from one value to another value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest: "Create a key-value map using the pairs of values in the input data" or something like that? #Resolved
@@ -76,7 +102,7 @@ internal abstract class ColumnOptionsBase | |||
internal sealed class ColumnOptions : ColumnOptionsBase | |||
{ | |||
/// <summary> | |||
/// Describes how the transformer handles one column pair. | |||
/// Describes how the transformer handles onumn pair. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo? #Resolved
@@ -1104,9 +1102,23 @@ public override void Process() | |||
} | |||
|
|||
/// <summary> | |||
/// Estimator for <see cref="HashingTransformer"/> which can hash either single valued columns or vector columns. For vector columns, | |||
/// it hashes each slot separately. It can hash either text values or key values. | |||
/// Estimator for <see cref="HashingTransformer"/>. hash either single valued columns or vector columns. For vector columns, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
h [](start = 56, length = 1)
uppercase #Resolved
/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnName"/>.</param> | ||
/// <param name="inputColumnName">Name of the column to transform. If set to <see langword="null"/>, the value of the <paramref name="outputColumnName"/> will be used as source.</param> | ||
/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnName"/>. | ||
/// This column's data type will be a vector of <see cref="System.UInt32"/>, or a <see cref="System.UInt32"/> based on whether the input column data types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
, or a [](start = 56, length = 61)
Can you see difference between <see cref="System.UInt32"/>
and <see cref="System.UInt32"/>
? Because I don't. #Pending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually that is correct: a vector of ints or an int.
I can add scalar.
In reply to: 277099542 [](ancestors = 277099542,277079450)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Polishes the documentation for the conversions catalog, adhering to the template in #3204
Marking it as not final until I do a second pass at the changes through CodeFlow.