Skip to content

Conversation

HoKim98
Copy link

@HoKim98 HoKim98 commented Sep 26, 2025

Which issue does this PR close?

Rationale for this change

Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.

This PR introduces BoxedFuture type alias to replace futures::future::BoxFuture.
It helps building on WebAssembly targets (e.g. wasm32-unknown-unknown, wasm32-wasip2).

The original codes are driven from: https://github.com/apache/opendal/blob/37efe24235388788b892f46e4101d59ecd37918c/core/src/raw/futures_util.rs#L43-L58

I think it's not an ultimate solution, because the API rework is in progress.
But I think it's useful for now.

What changes are included in this PR?

  • Replace all parquet's futures::future::BoxFuture into BoxedFuture for WASM targets, which is thread-local.
  • Replace all FutureExt::boxed calls into Box::pin to catch Send automatically

Are these changes tested?

All build steps below are passed on my local machine:

  • cargo build --package parquet --target wasm32-unknown-unknown -p parquet --features async
  • cargo build --package parquet --target wasm32-wasip1 -p parquet --features async
  • cargo build --package parquet --target wasm32-wasip2 -p parquet --features async

The tests are not passed because:

  • wasm32-unknown-unknown: getrandom is not supported
  • wasm32-wasip1, wasm32-wasip2: tokio/fs is not supported yet

Are there any user-facing changes?

NONE

@github-actions github-actions bot added the parquet Changes to the parquet crate label Sep 26, 2025
@HoKim98 HoKim98 force-pushed the feat/parquet/async-wasm branch 2 times, most recently from a05e85c to 04290ef Compare September 26, 2025 12:35
@HoKim98 HoKim98 force-pushed the feat/parquet/async-wasm branch from 04290ef to 39ded9b Compare September 26, 2025 12:37
@HoKim98 HoKim98 changed the title feat(parquet): add async support for wasm builds feat(parquet): add async support for wasm targets Sep 26, 2025
@alamb
Copy link
Contributor

alamb commented Sep 26, 2025

I think it's not an ultimate solution, because the API rework is in progress.

Thank you for this PR @HoKim98

One thing I have been pushing on is a "SansIO" type of API for the parquet readers, described here:

The idea is that with those APIs, you could use your own IO routines (aka not the AsyncRead traits).

We recently released a metadata parsing version here:

This lets you decode ParquetMetadata with any IO (e.g this example):

use tokio::io::{AsyncRead, AsyncReadExt, AsyncSeek, AsyncSeekExt};
// This function decodes Parquet Metadata from anything that implements
// [`AsyncRead`] and [`AsyncSeek`] such as a tokio::fs::File
async fn decode_metadata(
  file_len: u64,
  mut async_source: impl AsyncRead + AsyncSeek + Unpin
) -> Result<ParquetMetaData, ParquetError> {
  // We need a ParquetMetaDataPushDecoder to decode the metadata.
  let mut decoder = ParquetMetaDataPushDecoder::try_new(file_len).unwrap();
  loop {
    match decoder.try_decode() {
       Ok(DecodeResult::Data(metadata)) => { return Ok(metadata); } // decode successful
       Ok(DecodeResult::NeedsData(ranges)) => {
          // The decoder needs more data
          //
          // In this example we use the AsyncRead and AsyncSeek traits to read the
          // required ranges from the async source.
          let mut data = Vec::with_capacity(ranges.len());
          for range in &ranges {
            let mut buffer = vec![0; (range.end - range.start) as usize];
            async_source.seek(std::io::SeekFrom::Start(range.start)).await?;
            async_source.read_exact(&mut buffer).await?;
            data.push(Bytes::from(buffer));
          }
          // Push the data into the decoder and try to decode again on the next iteration.
          decoder.push_ranges(ranges, data).unwrap();
       }
       Ok(DecodeResult::Finished) => { unreachable!("returned metadata in previous match arm") }
       Err(e) => return Err(e),
    }
  }
}

I also have code to do the same for the actual parquet decoder here:

If you are interested in that feature I will try and prioritize it more

@HoKim98
Copy link
Author

HoKim98 commented Sep 26, 2025

Hello @alamb, thank you for great work!

One thing I have been pushing on is a "SansIO" type of API for the parquet readers, described here:

I'll try and take some feedbacks about building on WASM targets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AsyncFileWriter doesn't need to be Send
2 participants