-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Documentation samples for binary classifiers (Static API) #1456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documentation samples for binary classifiers (Static API) #1456
Conversation
/// Downloads the adult train dataset from the ML.NET repo | ||
/// </summary> | ||
public static string DownloadAdultTrainDataset() | ||
=> Download("https://github.com/raw/dotnet/machinelearning/master/test/data/adult.train", "adult.train"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"https://github.com/raw/dotnet/machinelearning/master/test/data/adult.trai [](start = 24, length = 85)
"https://github.com/raw/dotnet/machinelearning/master/test/data/adult.trai [](start = 24, length = 85)
It needs to include the hash, otherwise it might change overtime.
/// Downloads the adult test dataset from the ML.NET repo | ||
/// </summary> | ||
public static string DownloadAdultTestDataset() | ||
=> Download("https://github.com/raw/dotnet/machinelearning/master/test/data/adult.test", "adult.test"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/raw/dotnet/machinelearning/master/test/data/adult.test" [](start = 25, length = 85)
Similar here:
https://github.com/dotnet/machinelearning/blob/20c36ca769d6b1f4fc816de4ae5f46266e8edd9b/test/data/adult.train
#Resolved
// It takes care of logging, exception tracking and as a source of randomness | ||
// Using random seed and automatic level of concurrency | ||
using (var environment = new ConsoleEnvironment(seed: 0, conc: 0)) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not necessary, the MLContext below is sufficient.
row.Ethnicity.OneHotEncoding(), | ||
row.Sex.OneHotEncoding(), | ||
row.HoursPerWeek, | ||
row.NativeCountry.OneHotEncoding().SelectFeaturesBasedOnCount(count: 10)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.SelectFeaturesBasedOnCount(count: 10)) [](start = 66, length = 39)
@shmoradims to check whether this is necessary.
Thanks for the contribution, @bojanmisic. This looks great, a few small comments. If you sync to latest, you'll notice that we are now doing one sample per file, to avoid those pesky line changes in the codebase :) |
5360763
to
7bb2625
Compare
4a85dff
to
2ecbcf1
Compare
|
||
// NOTE: WHEN ADDING TO THE FILE, ALWAYS APPEND TO THE END OF IT. | ||
// If you change the existing content, check that the files referencing it in the XML documentation are still correct, as they reference | ||
// line by line. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❔ Does it not have the ability to import a named region? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am actually going to remove the copyright and change the references to import the whole file.
In reply to: 231153004 [](ancestors = 231153004)
@sfilipi Thank you for the review, I have updated the branch per your suggestions. Also updated the PR description to reflect the things I have changed. Removed WIP. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🕐
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addresses #1257.
Adding documentation samples for binary classifiers when using
Static API
:SDCA
FastTree
LightGBM
AveragedPerceptron
A couple of points I'd like to mention:
adult.train
dataset both for training and testing (90/10 split) in the examples (sinceadult.test
is not properly formatted).FastTree
in both Regression and Classification contexts, I had to rename already added examples fromFastTree.cs
toFastTreeRegression.cs
and made sure the change is reflected in the docs so the examples point to the correct files.