Skip to content

Add API for shuffling to MLContext #342

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
BeanHsiang opened this issue Jun 10, 2018 · 5 comments
Closed

Add API for shuffling to MLContext #342

BeanHsiang opened this issue Jun 10, 2018 · 5 comments
Assignees
Labels
API Issues pertaining the friendly API need info This issue needs more info before triage

Comments

@BeanHsiang
Copy link

the class "TrainTestSplit" only support "Fraction" in TrainTestSplit.Input current, "Shuffle" is other important attribute.
Could ML.NET support "Shuffle & Split" next version?

@shauheen shauheen added the question Further information is requested label Jun 11, 2018
@glebuk
Copy link
Contributor

glebuk commented Jun 19, 2018

ApproximateBootstrapSampler should also do shuffle. Also there is a ShuffleTransform that can be added before the split. It will become available once we change the API in #371. Alternatively you can shuffle data prior to feeding it to ML.NET externally.

@Ivanidzo4ka
Copy link
Contributor

Ivanidzo4ka commented Oct 18, 2018

DRI RESPONSE:
@Zruty0 do we plan to add Shuffle function to MlContext object?
Also probably worth to add sample of shuffling data into cookbook.

@Ivanidzo4ka Ivanidzo4ka added the API Issues pertaining the friendly API label Oct 18, 2018
@Zruty0
Copy link
Contributor

Zruty0 commented Oct 18, 2018

We do want to have shuffling available via MLContext, but not as an estimator/transformer. It'll be something like IDataView Shuffle(IDataView input, ....)

@shauheen shauheen added the need info This issue needs more info before triage label Dec 6, 2018
@sfilipi sfilipi removed the need info This issue needs more info before triage label Dec 7, 2018
@Zruty0 Zruty0 removed the question Further information is requested label Dec 18, 2018
@Zruty0 Zruty0 changed the title Could TrainTestSplit support Shuffle? Add API for shuffling to MLContext Dec 18, 2018
@shauheen shauheen added the need info This issue needs more info before triage label Jan 28, 2019
@shauheen
Copy link
Contributor

@Ivanidzo4ka can you come up with a design?

@rogancarr rogancarr self-assigned this Feb 6, 2019
@rogancarr
Copy link
Contributor

I actually am working on this as part of the DataOperations catalog and just found this issue. I'll push a PR shortly.

@ghost ghost locked as resolved and limited conversation to collaborators Mar 30, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
API Issues pertaining the friendly API need info This issue needs more info before triage
Projects
None yet
Development

No branches or pull requests

7 participants