Skip to content

No support for precision reduction when reducing dataset size for pandas dataframe or series. #1278

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
eddiebergman opened this issue Nov 3, 2021 · 0 comments
Labels
maintenance Internal maintenance

Comments

@eddiebergman
Copy link
Contributor

eddiebergman commented Nov 3, 2021

We currently have two methods for dataset size reduction, precision and subsample, introduced more clearly in PR #1250. However we have not implemented precision reduction with pandas dataframes as this is a bit more involved, when compared to the fact ndarray's have a uniform type while dataframes ahave a type per column.

We also can not use reduce_dataset_size_if_too_large with dataframes yet as we have not implemented a method to calculate it's size, such that we know how much to subsample.

This shouldn't be too hard to implement but will require updating tests as well.

Edit:
Just adding an extra point to include more nuanced calculation for spare matrices.
arr.data.nbytes + arr.indices.nbytes + arr.indptr.nbytes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Internal maintenance
Projects
None yet
Development

No branches or pull requests

1 participant