-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Find a new regression dataset #938
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
LIBSVM dataset is also commonly used in researches. |
We have the following data sets that can be used as regression:
The following can be reformulated to use as a regression prediction:
|
Rogan seems to have answered this question. |
The work item is to replace the synthetic datasets w/ ones more representative of user datasets. Rogan has pointed out great ones we can use as replacements in our tests. |
@justinormont The ones that Rogan pointed out are real datasets, breast-cancer dataset is from 1992. |
Some regression tests rely on a machine generated regression dataset (Gaussian noise on top of a linear function of a vector input). The file was introduced by #937.
We should replace this dataset with a real dataset. Justin @justinormont suggested to find something from data.gov, for example predicting the SF employee pay: https://catalog.data.gov/dataset/employee-compensation-53987
The text was updated successfully, but these errors were encountered: