-
Notifications
You must be signed in to change notification settings - Fork 1k
Open
Labels
Description
Describe the bug
I have a file generated using polars which polars, duckdb and other tools read fine. arrow-rs fails with a Failed to decode level data for struct array
error.
To Reproduce
Try to read the following file: https://storage.googleapis.com/ids-next/arrow-bug-dremel-encoding.parquet e.g. via
pub(crate) fn bug() -> () {
let src = "https://storage.googleapis.com/ids-next/arrow-bug-dremel-encoding.parquet";
let mut reader = File::open(src.strip_prefix("file://").unwrap()).unwrap();
let metadata = ArrowReaderMetadata::load(&mut reader, ArrowReaderOptions::default()).unwrap();
let mut market_reader =
ParquetRecordBatchReaderBuilder::new_with_metadata(reader, metadata.clone())
.build()
.unwrap();
let mut count = 0;
while let Some(batch) = market_reader.next() {
let batch = batch.unwrap();
count += batch.num_rows();
}
println!("Read {} rows", count);
()
}
Expected behavior
The file should read or provide a better error if the file contains something unsupported.
Additional context
I tried to come up with a unit test for struct_array.rs but ran out of time.
I tried this with a struct of only ints, but that didn't trigger the bug; thus, I assume the dict might be part of the cause.