-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Add a sample to SelectColumns #2380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2380 +/- ##
==========================================
+ Coverage 71.24% 71.25% +<.01%
==========================================
Files 783 783
Lines 140733 140781 +48
Branches 16086 16088 +2
==========================================
+ Hits 100266 100312 +46
- Misses 36014 36015 +1
- Partials 4453 4454 +1
|
/// <remarks> | ||
/// <format type="text/markdown"> | ||
/// <see cref="SelectColumns"/> operates on the schema of an input IDataView, | ||
/// either dropping unselected columns from the schema or keeping them but hiding them from the user. Keeping columns hidden |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keeping them but hiding them from the user [](start = 65, length = 43)
What do you mean by keeping them, but hiding them from the user? This meaning might get a bit confusing with KeepingColumns hidden.
/// <example> | ||
/// <format type="text/markdown"> | ||
/// <] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concat [](start = 26, length = 6)
SelectColumns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to your changes, but I think this should be SelectColumn (without the s) to keep inline with the Estimator name ColumnSelectingEstimator.
In reply to: 253242036 [](ancestors = 253242036)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the API is for a list of columns, so the wording matches the use here. I do think the gerund usage in the Estimators is a bit confusing, though!
|
||
/// <summary> | ||
/// ColumnSelectingEstimator is used to select a list of columns that user wants to drop from a given input. | ||
/// ColumnSelectingEstimator is used to select a list of columns that user wants to keep from a given input. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep [](start = 92, length = 4)
shouldn't this be keep or drop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just keep. SelectColumns only specifies those columns to keep.
var rowEnumerable = mlContext.CreateEnumerable<SampleInfertDataTransformed>(transformedData, reuseRowObject: false); | ||
|
||
// And finally, we can write out the rows of the dataset, looking at the columns of interest. | ||
Console.WriteLine($"Label and Educations columns obtained post-transformation."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Label [](start = 32, length = 5)
Should Label be replaced with Age since we are keeping Age and Education?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
d629a05
to
5c20ebb
Compare
// 34.0 1.0 0-5yrs 2.0 4.0 2.0 4.0 ... | ||
// 35.0 1.0 6-11yrs 1.0 3.0 32.0 5.0 ... | ||
|
||
// Select a subset of columns to keep, but don't drop the others. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but don't drop the others. [](start = 51, length = 26)
What do you mean by don't drop the others? It seems that you have dropped the others?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good eye. Leftovers from a different sample flow.
|
||
// Now we can transform the data and look at the output to confirm the behavior of CopyColumns. | ||
// Don't forget that this operation doesn't actually evaluate data until we read the data below. | ||
var transformedData = pipeline.Fit(trainData).Transform(trainData); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a few words on why that is the case:
Transformations are lazy in ML.NET
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
// Don't forget that this operation doesn't actually evaluate data until we read the data below. | ||
var transformedData = pipeline.Fit(trainData).Transform(trainData); | ||
|
||
// Print the number of columns schema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
columns schema [](start = 35, length = 14)
Print the number of columns in the schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a small PR to address documentation and samples.
Fixes #2370