Skip to content

Conversation

rogancarr
Copy link
Contributor

@rogancarr rogancarr commented Feb 1, 2019

This is a small PR to address documentation and samples.

  • Added a sample to SelectColumns
  • Fixes a bug in a sample link for ConcatTransform.

Fixes #2370

@codecov
Copy link

codecov bot commented Feb 2, 2019

Codecov Report

Merging #2380 into master will increase coverage by <.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #2380      +/-   ##
==========================================
+ Coverage   71.24%   71.25%   +<.01%     
==========================================
  Files         783      783              
  Lines      140733   140781      +48     
  Branches    16086    16088       +2     
==========================================
+ Hits       100266   100312      +46     
- Misses      36014    36015       +1     
- Partials     4453     4454       +1
Flag Coverage Δ
#Debug 71.25% <ø> (ø) ⬆️
#production 67.6% <ø> (-0.01%) ⬇️
#test 85.3% <ø> (+0.02%) ⬆️

/// <remarks>
/// <format type="text/markdown">
/// <see cref="SelectColumns"/> operates on the schema of an input IDataView,
/// either dropping unselected columns from the schema or keeping them but hiding them from the user. Keeping columns hidden
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keeping them but hiding them from the user [](start = 65, length = 43)

What do you mean by keeping them, but hiding them from the user? This meaning might get a bit confusing with KeepingColumns hidden.

/// <example>
/// <format type="text/markdown">
/// <![CDATA[
/// [!code-csharp[Concat](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/Transforms/SelectColumns.cs)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concat [](start = 26, length = 6)

SelectColumns

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to your changes, but I think this should be SelectColumn (without the s) to keep inline with the Estimator name ColumnSelectingEstimator.


In reply to: 253242036 [](ancestors = 253242036)

Copy link
Contributor Author

@rogancarr rogancarr Feb 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the API is for a list of columns, so the wording matches the use here. I do think the gerund usage in the Estimators is a bit confusing, though!


/// <summary>
/// ColumnSelectingEstimator is used to select a list of columns that user wants to drop from a given input.
/// ColumnSelectingEstimator is used to select a list of columns that user wants to keep from a given input.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep [](start = 92, length = 4)

shouldn't this be keep or drop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just keep. SelectColumns only specifies those columns to keep.

var rowEnumerable = mlContext.CreateEnumerable<SampleInfertDataTransformed>(transformedData, reuseRowObject: false);

// And finally, we can write out the rows of the dataset, looking at the columns of interest.
Console.WriteLine($"Label and Educations columns obtained post-transformation.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Label [](start = 32, length = 5)

Should Label be replaced with Age since we are keeping Age and Education?

Copy link
Member

@singlis singlis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@rogancarr rogancarr force-pushed the 2370_selectcolumns_samples branch from d629a05 to 5c20ebb Compare February 2, 2019 06:10
// 34.0 1.0 0-5yrs 2.0 4.0 2.0 4.0 ...
// 35.0 1.0 6-11yrs 1.0 3.0 32.0 5.0 ...

// Select a subset of columns to keep, but don't drop the others.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but don't drop the others. [](start = 51, length = 26)

What do you mean by don't drop the others? It seems that you have dropped the others?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good eye. Leftovers from a different sample flow.


// Now we can transform the data and look at the output to confirm the behavior of CopyColumns.
// Don't forget that this operation doesn't actually evaluate data until we read the data below.
var transformedData = pipeline.Fit(trainData).Transform(trainData);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a few words on why that is the case:
Transformations are lazy in ML.NET

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// Don't forget that this operation doesn't actually evaluate data until we read the data below.
var transformedData = pipeline.Fit(trainData).Transform(trainData);

// Print the number of columns schema
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

columns schema [](start = 35, length = 14)

Print the number of columns in the schema.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

@artidoro artidoro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@rogancarr rogancarr merged commit e0e36af into dotnet:master Feb 4, 2019
@rogancarr rogancarr deleted the 2370_selectcolumns_samples branch February 4, 2019 19:47
@ghost ghost locked as resolved and limited conversation to collaborators Mar 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants