Skip to content

test: allow external_access_plan run on windows #13531

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Nov 25, 2024

Conversation

zhuliquan
Copy link
Contributor

Which issue does this PR close?

Closes #13530.

Rationale for this change

I always receive this warning when running cargo clippy, and I would like to get rid of it.

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the core Core DataFusion crate label Nov 22, 2024
alamb
alamb previously approved these changes Nov 22, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect it is not used when not all features are activated 🤔

@alamb
Copy link
Contributor

alamb commented Nov 22, 2024

Thanks @zhuliquan - it would also be really nice to figure out how you are running the code/tests to see this error. Maybe there is some better way we can come up with to fix it

@findepi
Copy link
Member

findepi commented Nov 22, 2024

we should really avoid #[allow(dead_code)] for stuff used only sometimes, because we won't know when this can be removed/

This is used in external_access_plan which is disabled on windows.
Maybe we should try enabling it?

@zhuliquan zhuliquan changed the title minor: allow dead code for Scenario::UTF8 test: allow external_access_plan run on windows Nov 23, 2024
@zhuliquan zhuliquan force-pushed the minor-utf8_dead_code branch from 84915b6 to 53b7305 Compare November 23, 2024 07:49
@zhuliquan zhuliquan force-pushed the minor-utf8_dead_code branch from 53b7305 to 32acd9d Compare November 23, 2024 07:52
@zhuliquan
Copy link
Contributor Author

zhuliquan commented Nov 23, 2024

we should really avoid #[allow(dead_code)] for stuff used only sometimes, because we won't know when this can be removed/

This is used in external_access_plan which is disabled on windows. Maybe we should try enabling it?

Yeah, I fix code and enable it

let new_file_name = if cfg!(target_os = "windows") {
    // Windows path separator is different from Unix
    file_name.replace("\\", "/")
} else {
    file_name.clone()
};

@zhuliquan zhuliquan closed this Nov 23, 2024
@alamb alamb reopened this Nov 23, 2024
@alamb
Copy link
Contributor

alamb commented Nov 23, 2024

🤔 the test now seems to be failing

I wonder if it has to do with how the temp files are created 🤔

@alamb alamb dismissed their stale review November 23, 2024 12:07

Test has been changed

@zhuliquan
Copy link
Contributor Author

zhuliquan commented Nov 23, 2024

🤔 the test now seems to be failing

I wonder if it has to do with how the temp files are created 🤔

I suspect that the specific char '%7E' (i.e. '~') appears in the directory, and the Windows system seems to be unable to handle this.

thread 'parquet::external_access_plan::two_selections' panicked at datafusion\core\tests\parquet\external_access_plan.rs:289:10:
called `Result::unwrap()` on an `Err` value: ParquetError(External(NotFound { path: "C:\\Users\\RUNNER%7E1\\AppData\\Local\\Temp\\user_access_planObCLIR.parquet", source: Os { code: 3, kind: NotFound, message: "The system cannot find the path specified." } }))

So I fix below code to avoid the specific char which begin with '%'.

  1. replacing '\' to '/' when target_os is windows.

    let new_file_name = if cfg!(target_os = "windows") {
    // Windows path separator is different from Unix
    file_name.replace("\\", "/")
    } else {
    file_name.clone()
    };

    Because, Path::new will replace '\' to '\%5C' (i.e. '') in PartitionedFile::new.
    let mut partitioned_file = PartitionedFile::new(new_file_name, *file_size);

    and
    pub fn new(path: impl Into<String>, size: u64) -> Self {
    Self {
    object_meta: ObjectMeta {
    location: Path::from(path.into()),

  2. using current dir as temp file dir (Because temp dir in CI is located at C:\\Users\\RUNNER%7E1 and other loading parquet cases which are run ok on windows also use data in ../../parquet-testing/data), and remove temp file at the end of the case run.


    and
    std::fs::remove_file(file_name).unwrap();

  3. For sake of removing temp file at the end of each case run, I remove singleton mode of TestData in get_test_data

    static _TEST_DATA: OnceLock<TestData> = OnceLock::new();
    /// Return a parquet file with 2 row groups each with 5 rows
    fn get_test_data() -> TestData {
    let scenario = Scenario::UTF8;
    let row_per_group = 5;
    let mut temp_file = tempfile::Builder::new()
    .prefix("user_access_plan")
    .suffix(".parquet")
    .tempfile_in(Path::new(""))
    .expect("tempfile creation");
    let props = WriterProperties::builder()
    .set_max_row_group_size(row_per_group)
    .build();
    let batches = create_data_batch(scenario);
    let schema = batches[0].schema();
    let mut writer =
    ArrowWriter::try_new(&mut temp_file, schema.clone(), Some(props)).unwrap();
    for batch in batches {
    writer.write(&batch).expect("writing batch");
    }
    writer.close().unwrap();
    let file_name = temp_file.path().to_string_lossy().to_string();
    let file_size = temp_file.path().metadata().unwrap().len();
    TestData {
    _temp_file: temp_file,
    schema,
    file_name,
    file_size,
    }
    }

@zhuliquan zhuliquan requested a review from alamb November 23, 2024 16:26
@zhuliquan zhuliquan force-pushed the minor-utf8_dead_code branch 2 times, most recently from b161625 to b65d2a3 Compare November 24, 2024 09:15
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @zhuliquan -- i have one more question. I don't think it is necessary to fix but it would be nice to get a resolution


/// Return a parquet file with 2 row groups each with 5 rows
fn get_test_data() -> &'static TestData {
TEST_DATA.get_or_init(|| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason to use the TEST_DATA thing was to avoid creating the same file over and over again.

Do we need to remove TEST_DATA?

It seems like the core difference is the use of tempfile_in rather than the change from static

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we aren't going to use TEST_DATA anymore perhaps we can remove it (rather than name it starting with _ to avoid a compiler error) 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason to use the TEST_DATA thing was to avoid creating the same file over and over again.

Do we need to remove TEST_DATA?

It seems like the core difference is the use of tempfile_in rather than the change from static

There is no chance for delete TEST_DATA. Because there is no main test function to lead minor test cases.

Copy link
Contributor Author

@zhuliquan zhuliquan Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we aren't going to use TEST_DATA anymore perhaps we can remove it (rather than name it starting with _ to avoid a compiler error) 🤔

@alamb Yeah, I share same point with you, I remove it now.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like an improvement to me -- thanks @zhuliquan

@alamb alamb merged commit 172d931 into apache:main Nov 25, 2024
25 checks passed
@zhuliquan zhuliquan deleted the minor-utf8_dead_code branch November 26, 2024 05:35
@alamb alamb mentioned this pull request Dec 3, 2024
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

warning: variant UTF8 is never constructed
3 participants