Skip to content

Documentation samples for binary classifiers (Static API) #1456

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

bojanmisic
Copy link
Contributor

@bojanmisic bojanmisic commented Oct 30, 2018

Addresses #1257.

Adding documentation samples for binary classifiers when using Static API:

  1. SDCA
  2. FastTree
  3. LightGBM
  4. AveragedPerceptron

A couple of points I'd like to mention:

  1. I have used adult.train dataset both for training and testing (90/10 split) in the examples (since adult.test is not properly formatted).
  2. Since we can use for example FastTree in both Regression and Classification contexts, I had to rename already added examples from FastTree.cs to FastTreeRegression.cs and made sure the change is reflected in the docs so the examples point to the correct files.

/// Downloads the adult train dataset from the ML.NET repo
/// </summary>
public static string DownloadAdultTrainDataset()
=> Download("https://github.com/raw/dotnet/machinelearning/master/test/data/adult.train", "adult.train");
Copy link
Member

@sfilipi sfilipi Nov 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// Downloads the adult test dataset from the ML.NET repo
/// </summary>
public static string DownloadAdultTestDataset()
=> Download("https://github.com/raw/dotnet/machinelearning/master/test/data/adult.test", "adult.test");
Copy link
Member

@sfilipi sfilipi Nov 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// It takes care of logging, exception tracking and as a source of randomness
// Using random seed and automatic level of concurrency
using (var environment = new ConsoleEnvironment(seed: 0, conc: 0))
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not necessary, the MLContext below is sufficient.

row.Ethnicity.OneHotEncoding(),
row.Sex.OneHotEncoding(),
row.HoursPerWeek,
row.NativeCountry.OneHotEncoding().SelectFeaturesBasedOnCount(count: 10)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.SelectFeaturesBasedOnCount(count: 10)) [](start = 66, length = 39)

@shmoradims to check whether this is necessary.

@sfilipi
Copy link
Member

sfilipi commented Nov 1, 2018

Thanks for the contribution, @bojanmisic. This looks great, a few small comments.

If you sync to latest, you'll notice that we are now doing one sample per file, to avoid those pesky line changes in the codebase :)

@bojanmisic bojanmisic force-pushed the docs_for_binary_classifiers_static branch 2 times, most recently from 5360763 to 7bb2625 Compare November 6, 2018 14:07
@bojanmisic bojanmisic force-pushed the docs_for_binary_classifiers_static branch from 4a85dff to 2ecbcf1 Compare November 6, 2018 14:43

// NOTE: WHEN ADDING TO THE FILE, ALWAYS APPEND TO THE END OF IT.
// If you change the existing content, check that the files referencing it in the XML documentation are still correct, as they reference
// line by line.
Copy link
Member

@sharwell sharwell Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❔ Does it not have the ability to import a named region? #Resolved

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am actually going to remove the copyright and change the references to import the whole file.


In reply to: 231153004 [](ancestors = 231153004)

@bojanmisic bojanmisic changed the title [WIP] Documentation samples for binary classifiers (Static API) Documentation samples for binary classifiers (Static API) Nov 6, 2018
@bojanmisic
Copy link
Contributor Author

@sfilipi Thank you for the review, I have updated the branch per your suggestions. Also updated the PR description to reflect the things I have changed.

Removed WIP.

Copy link
Member

@sfilipi sfilipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🕐

Copy link
Member

@sfilipi sfilipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Contributor

@Zruty0 Zruty0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@sfilipi sfilipi merged commit 509ac6b into dotnet:master Nov 6, 2018
@ghost ghost locked as resolved and limited conversation to collaborators Mar 27, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants