-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Support 32-bit Utf8
/Binary
/List
types
#7422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I don't think their are that much more efficient, probably only the pointers and length of the array are 32-bit instead of 64-bit, but for the data storage part itself there is no change. Rechunking to one chunk might not be possible with the 32-bit variants, but maybe in combination with the streaming API, this is less of a problem. |
IMO the small versions are rather worthless for polars' goals. They keep offsets with We did support them in the past, but because of problems mentioned above we switched to the large variants. |
All right, good to have some context with that decision! I ran into this because Delta does not support large types. I'll close this as "not planned" for now. |
We can add an option to convert to small types when calling |
That would be helpful - I was actually implementing some casting functionality in pyarrow, but I guess it's faster to do it as part of the original conversion in Rust. I also need to cast all unsigned types to signed types, and all Datetime types to use microseconds. But I can do that in Polars. Overall, |
I guess this is related: #7431 |
Uh oh!
There was an error while loading. Please reload this page.
Problem description
Polars
Utf8
/Binary
/List
datatypes are currently only available in their 64-bit variant. In Arrow these are known aslarge_string
/large_binary
/large_list
.Arrow also has a 32-bit version for these:
string
/binary
/list_
.The 32-bit versions are probably sufficient for many use cases, and will be more efficient. Supporting these will also allow zero-copy conversions from these types into Polars.
The text was updated successfully, but these errors were encountered: