Skip to content

Column with List(Struct) causes failed to decode level data for struct array #8404

@valkum

Description

@valkum

Describe the bug

I have a file generated using polars which polars, duckdb and other tools read fine. arrow-rs fails with a Failed to decode level data for struct array error.

To Reproduce

Try to read the following file: https://storage.googleapis.com/ids-next/arrow-bug-dremel-encoding.parquet e.g. via

pub(crate) fn bug() -> () {
    let src = "https://storage.googleapis.com/ids-next/arrow-bug-dremel-encoding.parquet";
    let mut reader = File::open(src.strip_prefix("file://").unwrap()).unwrap();

    let metadata = ArrowReaderMetadata::load(&mut reader, ArrowReaderOptions::default()).unwrap();

    let mut market_reader =
        ParquetRecordBatchReaderBuilder::new_with_metadata(reader, metadata.clone())
            .build()
            .unwrap();

    let mut count = 0;

    while let Some(batch) = market_reader.next() {
        let batch = batch.unwrap();
        count += batch.num_rows();
    }

    println!("Read {} rows", count);

    ()
}

Expected behavior

The file should read or provide a better error if the file contains something unsupported.

Additional context

I tried to come up with a unit test for struct_array.rs but ran out of time.
I tried this with a struct of only ints, but that didn't trigger the bug; thus, I assume the dict might be part of the cause.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions