-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Key to binary samples for documentation #3211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
/// This example demonstrates the use of MapKeyToVector by mapping keys to floats[] for multiple columns at once. | ||
/// Because the ML.NET KeyType maps the missing value to zero, counting of uints starts at 1, so the values | ||
/// converted to KeyTypes will appear skewed by one. See https://github.com/dotnet/machinelearning/issues/3072 | ||
public static void Example() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i have tried to add explanations, because of issue 3072.
I think we should change the mapping of the missing value to 0... #WontFix
Codecov Report
@@ Coverage Diff @@
## master #3211 +/- ##
==========================================
+ Coverage 72.62% 72.62% +<.01%
==========================================
Files 807 807
Lines 145080 145080
Branches 16213 16213
==========================================
+ Hits 105369 105370 +1
Misses 35294 35294
+ Partials 4417 4416 -1
|
Referencing #3072 since the previews on those samples illustrate what might be a source of confusion. #WontFix |
// TransformedData obtained post-transformation. | ||
// | ||
// Timeframe TimeframeVector Category CategoryVector | ||
// 10 0,0,0,0,0,0,0,0,0,1 6 0,0,0,0,0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct? The first Timeframe
was 9
in your declaration. If no transformer is applied to Timeframe
, ML.NET should not touch its value? Is my understanding correct? Or I am missing something? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah.... this is why i logged 3072, and had a discussion on Friday post scrum. Gets easier to think of them as strings, not numbers.
the string "0", gets mapped to the first available ulong, 1. (The ulong 0 is reserved for missing key).
The string "1" gets mapped to the next available : 2.
On our KeyType system counting starts from 1. 0 is reserved.
In reply to: 272774353 [](ancestors = 272774353)
|
||
Console.WriteLine($" Timeframe TimeframeVector Category CategoryVector"); | ||
foreach (var featureRow in features) | ||
Console.WriteLine($"{featureRow.Timeframe}\t\t\t{string.Join(',', featureRow.TimeframeVector)}\t\t\t{featureRow.Category}\t\t{string.Join(',', featureRow.CategoryVector)}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
($" [](start = 33, length = 3)
For this long string, consider doing the old-style "{0} {1}", thingOne, thingTwo
etc.
As it is, it's hard to read. #Pending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just a few small comments.
|
||
// Constructs the ML.net pipeline | ||
var pipeline = mlContext.Transforms.Conversion.MapKeyToVector( | ||
new[]{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new[]{ [](start = 16, length = 6)
nit: maybe keep this ("new[]{") on the previous line, or indent the next two. #Resolved
/// Marks member as <see cref="KeyDataViewType"/> and specifies <see cref="KeyDataViewType"/> cardinality. | ||
/// Marks member as <see cref="KeyDataViewType"/>. The <paramref name="count"/> should be set to | ||
/// one more than the maximum value for the keys (to account for missing values). | ||
/// If the values are outside of the specified cardinality they will be mapped to the missing value representation: 0. | ||
/// </summary> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the old description was correct, especially when the underlying values are not numbers. When that's the case, I think it might make more sense to think about cardinality instead of maximum value of keys. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe what you have written could be added as a note for the special case of integer values?
In reply to: 273266746 [](ancestors = 273266746)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is internal, it does have one yet. In reply to: 481010536 [](ancestors = 481010536) Refers to: src/Microsoft.ML.Transforms/ConversionsCatalog.cs:23 in 9db4dfe. [](commit_id = 9db4dfe, deletion_comment = False) |
* adding samples for KeyToVector and KeyToBinaryVector * typo * Adding pointers to the KeyType. * PR review comments
Towards #1209
This adds the last batch of samples for the Conversions catalog.
The samples are KeyToVector and KeyToBinaryVector