Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
The MIRI tool helps crates ensure they have no undefined behavior https://github.com/rust-lang/miri
To improve the stability / security of the parquet crate, it would be cool to run MIRI against that too.
Describe the solution you'd like
Ideally, we would run same kind of MIRI checks on the parquet crate as we do now on arrow: https://github.com/apache/arrow-rs/blob/master/.github/workflows/miri.yaml#L60
This means a command something like
cargo +nightly miri test -p parquet
Sadly, when I run this locally the very first test fails with an alignment issue
(arrow_dev) alamb@MacBook-Pro:~/Software/arrow-rs$ cargo +nightly miri test -p parquet
WARNING: Ignoring `RUSTC_WRAPPER` environment variable, Miri does not support wrapping.
Compiling arrow v6.0.0-SNAPSHOT (/Users/alamb/Software/arrow-rs/arrow)
Compiling parquet v6.0.0-SNAPSHOT (/Users/alamb/Software/arrow-rs/parquet)
Finished test [unoptimized + debuginfo] target(s) in 4.53s
Running unittests (target/miri/x86_64-apple-darwin/debug/deps/parquet-7056b5943fd0017c)
running 456 tests
test arrow::array_reader::tests::test_complex_array_reader_def_and_rep_levels ... error: Undefined Behavior: accessing memory with alignment 1, but alignment 4 is required
--> parquet/src/util/bit_packing.rs:150:12
|
150 | *out = (*in_buf) % (1u32 << 2);
| ^^^^^^^^^ accessing memory with alignment 1, but alignment 4 is required
|
= help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
= help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
= note: inside `util::bit_packing::unpack2_32` at parquet/src/util/bit_packing.rs:150:12
note: inside `util::bit_packing::unpack32` at parquet/src/util/bit_packing.rs:37:14
--> parquet/src/util/bit_packing.rs:37:14
|
37 | 2 => unpack2_32(in_ptr, out_ptr),
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `util::bit_util::BitReader::get_batch::<i16>` at parquet/src/util/bit_util.rs:568:30
--> parquet/src/util/bit_util.rs:568:30
|
568 | in_ptr = unpack32(in_ptr, out_ptr, num_bits);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `encodings::rle::RleDecoder::get_batch::<i16>` at parquet/src/encodings/rle.rs:415:30
--> parquet/src/encodings/rle.rs:415:30
|
415 | num_values = bit_reader.get_batch::<T>(
| ______________________________^
416 | | &mut buffer[values_read..values_read + num_values],
417 | | self.bit_width as usize,
418 | | );
| |_________________^
note: inside `encodings::levels::LevelDecoder::get` at parquet/src/encodings/levels.rs:262:35
--> parquet/src/encodings/levels.rs:262:35
|
262 | let values_read = decoder.get_batch::<i16>(&mut buffer[0..len])?;
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `column::reader::ColumnReaderImpl::<data_type::ByteArrayType>::read_def_levels` at parquet/src/column/reader.rs:459:9
--> parquet/src/column/reader.rs:459:9
|
459 | level_decoder.get(buffer)
| ^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `column::reader::ColumnReaderImpl::<data_type::ByteArrayType>::read_batch` at parquet/src/column/reader.rs:210:38
--> parquet/src/column/reader.rs:210:38
|
210 | num_def_levels = self.read_def_levels(
| ______________________________________^
211 | | &mut levels[levels_read..levels_read + iter_batch_size],
212 | | )?;
| |_____________________^
note: inside `<arrow::array_reader::ComplexObjectArrayReader<data_type::ByteArrayType, arrow::converter::ArrayRefConverter<std::vec::Vec<std::option::Option<data_type::ByteArray>>, arrow::array::GenericStringArray<i32>, arrow::converter::Utf8ArrayConverter>> as arrow::array_reader::ArrayReader>::next_batch` at parquet/src/arrow/array_reader.rs:490:17
--> parquet/src/arrow/array_reader.rs:490:17
|
490 | / self.column_reader.as_mut().unwrap().read_batch(
491 | | num_to_read,
492 | | cur_def_levels_buf,
493 | | cur_rep_levels_buf,
494 | | cur_data_buf,
495 | | )?;
| |_________________^
note: inside `arrow::array_reader::tests::test_complex_array_reader_def_and_rep_levels` at parquet/src/arrow/array_reader.rs:2229:21
--> parquet/src/arrow/array_reader.rs:2229:21
|
2229 | let array = array_reader.next_batch(values_per_page / 2).unwrap();
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside closure at parquet/src/arrow/array_reader.rs:2154:5
--> parquet/src/arrow/array_reader.rs:2154:5
|
2153 | #[test]
| ------- in this procedural macro expansion
2154 | / fn test_complex_array_reader_def_and_rep_levels() {
2155 | | // Construct column schema
2156 | | let message_type = "
2157 | | message test_schema {
... |
2276 | | );
2277 | | }
| |_____^
= note: inside `<[closure@parquet/src/arrow/array_reader.rs:2154:5: 2277:6] as std::ops::FnOnce<()>>::call_once - shim` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
= note: inside `<fn() as std::ops::FnOnce<()>>::call_once - shim(fn())` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
= note: inside `test::__rust_begin_short_backtrace::<fn()>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:578:5
= note: inside closure at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:569:30
= note: inside `<[closure@test::run_test::{closure#2}] as std::ops::FnOnce<()>>::call_once - shim(vtable)` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
= note: inside `<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send> as std::ops::FnOnce<()>>::call_once` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/alloc/src/boxed.rs:1572:9
= note: inside `<std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>> as std::ops::FnOnce<()>>::call_once` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panic.rs:347:9
= note: inside `std::panicking::r#try::do_call::<std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>>, ()>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:401:40
= note: inside `std::panicking::r#try::<(), std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>>>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:365:19
= note: inside `std::panic::catch_unwind::<std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>>, ()>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panic.rs:434:14
= note: inside `test::run_test_in_process` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:601:18
= note: inside closure at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:493:39
= note: inside `test::run_test::run_test_inner` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:531:13
= note: inside `test::run_test` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:565:28
= note: inside `test::run_tests::<[closure@test::run_tests_console::{closure#2}]>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:306:17
= note: inside `test::run_tests_console` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/console.rs:290:5
= note: inside `test::test_main` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:123:15
= note: inside `test::test_main_static` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:142:5
= note: inside `main`
= note: inside `<fn() as std::ops::FnOnce<()>>::call_once - shim(fn())` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
= note: inside `std::sys_common::backtrace::__rust_begin_short_backtrace::<fn(), ()>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:125:18
= note: inside closure at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/rt.rs:63:18
= note: inside `std::ops::function::impls::<impl std::ops::FnOnce<()> for &dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe>::call_once` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:259:13
= note: inside `std::panicking::r#try::do_call::<&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe, i32>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:401:40
= note: inside `std::panicking::r#try::<i32, &dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:365:19
= note: inside `std::panic::catch_unwind::<&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe, i32>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panic.rs:434:14
= note: inside closure at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/rt.rs:45:48
= note: inside `std::panicking::r#try::do_call::<[closure@std::rt::lang_start_internal::{closure#2}], isize>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:401:40
= note: inside `std::panicking::r#try::<isize, [closure@std::rt::lang_start_internal::{closure#2}]>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:365:19
= note: inside `std::panic::catch_unwind::<[closure@std::rt::lang_start_internal::{closure#2}], isize>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panic.rs:434:14
= note: inside `std::rt::lang_start_internal` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/rt.rs:45:20
= note: inside `std::rt::lang_start::<()>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/rt.rs:62:5
= note: this error originates in the attribute macro `test` (in Nightly builds, run with -Z macro-backtrace for more info)
Describe alternatives you've considered
The parquet2 crate in https://github.com/jorgecarleitao/parquet2 from @jorgecarleitao apparently already has parquet running under MIR this already running, so depending on if that gets integrated into the main arrow-rs repo, that might change how we go about getting a clean run
Additional context
Thanks to help from @roee88 , @jhorstmann and others who I probably missed, we now have MIRI checks validated for the arrow crate on each PR: https://github.com/apache/arrow-rs/runs/3154551440
Which runs a command such as:
cargo miri test -p arrow -- --skip csv --skip ipc --skip json