Replies: 1 comment
-
Yes Hive's string limit is 2gb. We should add large_string to data type mappings. Would you open an issue for this, please? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I am trying to create an Athena table on top of existing parquet files on S3. These files have the pyarrow large_string type for some columns, so when doing a
wr.s3.read_parquet_metadata
I getawswrangler.exceptions.UnsupportedType: Unsupported Pyarrow type: large_string
.The pyarrow2athena function only checks for string and not large_string:
https://github.com/aws/aws-sdk-pandas/blob/6c0f65b6b63b223bec1059ecd037697b068f7e63/awswrangler/_data_types.py#L41C8-L41C8
Curious if the pyarrow large_string type could be supported here?
When I create a glue table with
string
as the type for these existing parquet files, Athena queries seem to function normally. I believe the string limit is 2gb in Athena, so not sure it that's motivation for not supporting the type.Beta Was this translation helpful? Give feedback.
All reactions