Skip to content

implement Frame #81

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
gdementen opened this issue Feb 1, 2017 · 3 comments
Open

implement Frame #81

gdementen opened this issue Feb 1, 2017 · 3 comments

Comments

@gdementen
Copy link
Contributor

gdementen commented Feb 1, 2017

which would store different fields/columns with (potentially) different types.

Each field from an LFrame should also be accessible via __getattr__.

We might want to use a ColumnArray instead of a numpy structured array like in LIAM2 as the storage backend so that we can add fields efficiently.

See the https://github.com/liam2/larray/tree/structured_array branch.

@gdementen
Copy link
Contributor Author

There are several big issues to resolve:

  • Q: what does read_csv et al return: LArray or LFrame?
    A: I think they should return an LFrame by default. API-wise, I think it would work, but it would be a performance hit, relative to what we have now, because the LFrame would at best need to be converted to LArray at the first sign of an aggregate.
  • Q: do we implement aggregates over the column/fields axis, if so how?
    A: the easiest would be to raise an Exception in that case, but then we cannot return an LFrame by default for read_csv. One option would be to transparently convert to LArray when doing aggregates. This screams pandas.NDFrame all over again but I think it is sane.
  • Q: if we store as a ColumnArray, we will probably need to re-implement many operations, is there a way to avoid this?
    A: I do not think we can avoid this, but we do shouldn't need to re-implement user-facing methods, but rather make ColumnArray implement the subset of the numpy API that we use internally. This is some significant work, but should not be too hard.

We should see what xarray does here.

@alixdamman
Copy link
Collaborator

  • Q: do we implement aggregates over the column/fields axis, if so how?
    A: the easiest would be to raise an Exception in that case, but then we cannot return an LFrame by default for read_csv.

@gdementen why?

@gdementen
Copy link
Contributor Author

because currently read_csv returns LArrays where aggregates on the column axis work. This would be a massive backward incompatibility/pain in the * for our users if suddenly all their data loaded from .csv files failed to aggregate over the column axis.

@gdementen gdementen removed this from the nice_to_have milestone Aug 1, 2019
@alixdamman alixdamman added this to the nice_to_have milestone Oct 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants