Skip to content

ReadFromTextLoader(..., supportsSparse, ...) defaults to true? #2576

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
stephentoub opened this issue Feb 15, 2019 · 2 comments · Fixed by #2630
Closed

ReadFromTextLoader(..., supportsSparse, ...) defaults to true? #2576

stephentoub opened this issue Feb 15, 2019 · 2 comments · Fixed by #2630
Assignees
Labels
API Issues pertaining the friendly API
Milestone

Comments

@stephentoub
Copy link
Member

ReadFromTextFile has a supportsSparse = true argument. It seems like this should actually default to false? Otherwise it tries to interpret all of the input as if it may contain sparse data and fails in the parsing on every line, unless there actually is sparse data.

@eerhardt
Copy link
Member

@TomFinley @Ivanidzo4ka @yaeldekel - do you know why the current default value for this is true? It seems like a very specific format, that the majority of data sets will not conform to.

@TomFinley
Copy link
Contributor

TomFinley commented Feb 15, 2019

I agree with this. The same might be said of "quoting" which fouls people up more often than it helps them. Not only based on your observation from this issue, but if I think back on the years we've had this loader this has been a persistent and confusing problem.

The argument might be made that TextLoader and TextSaver arguments should be symmetrical, but as I argued in #2452 text saver by default writes out a settings pre-amble anyway which overrides defaults in the text loader, and there are other cases where we have decided that having these be disparate is appropriate. Currently this disparity is limited to header, but if we judge it appropriate there's no reason not to expand it, and I see that the case for sparsity and quoting is even stronger than the case for headers.

If someone knows enough to conform to our format for quoting and sparse vectors, they certainly know enough to set this option, as well as the quoting option.

@TomFinley TomFinley added the API Issues pertaining the friendly API label Feb 15, 2019
@wschin wschin self-assigned this Feb 15, 2019
@shauheen shauheen added this to the 0219 milestone Feb 20, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Mar 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
API Issues pertaining the friendly API
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants