Skip to content

Conversation

hahnjo
Copy link
Member

@hahnjo hahnjo commented Aug 11, 2025

Switch the existing code to use the RNTupleParallelWriter with one RNTupleFillContext per slot. For sequential snapshotting, this should be (almost) as efficient as the RNTupleWriter (one additional cloned RNTupleModel for the only fill context), but save quite a bit of code duplication and in testing effort.

Copy link

github-actions bot commented Aug 11, 2025

Test Results

    20 files      20 suites   3d 19h 1m 23s ⏱️
 3 645 tests  3 642 ✅  0 💤 3 ❌
71 224 runs  71 211 ✅ 10 💤 3 ❌

For more details on these failures, see this check.

Results for commit 6fd522f.

♻️ This comment has been updated with latest results.

@hahnjo hahnjo force-pushed the df-ntuple-snapshot-mt branch from ced6d97 to 8a09b36 Compare August 11, 2025 13:50
@pcanal
Copy link
Member

pcanal commented Aug 11, 2025

(one additional cloned RNTupleModel for the only fill context)

I am confused by the wording of this part of the commit log. only suggest there is a single fill context for the whole process ... which is contradicted by one fill context per slot. What is means by the above sentence?

@hahnjo
Copy link
Member Author

hahnjo commented Aug 12, 2025

(one additional cloned RNTupleModel for the only fill context)

I am confused by the wording of this part of the commit log. only suggest there is a single fill context for the whole process ... which is contradicted by one fill context per slot. What is means by the above sentence?

Agreed, it's not well formulated. What I'm trying to say is that for sequential snapshotting (that is already supported before the PR), changing from the RNTupleWriter to the RNTupleParallelWriter has only a slight overhead. I will revise the commit message.

@hahnjo hahnjo marked this pull request as ready for review August 12, 2025 06:43
@hahnjo hahnjo requested a review from jblomer August 12, 2025 06:44
Copy link
Contributor

@enirolf enirolf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really really nice! I'm wondering if for completeness we should also add some of the existing tests as MT tests 🤔

@hahnjo hahnjo force-pushed the df-ntuple-snapshot-mt branch from 8a09b36 to a0e8be9 Compare August 12, 2025 13:20
@hahnjo
Copy link
Member Author

hahnjo commented Aug 12, 2025

I'm wondering if for completeness we should also add some of the existing tests as MT tests 🤔

Yes, I wasn't sure either. We're of course massively benefiting that we use the exact same model creation and filling code that you already wrote all the tests for. Additionally, we have the challenge of MT scheduling, so even the best test case will have to deal with non-determinism and potentially still not test the relevant thing because all events end up on a single thread...

@enirolf
Copy link
Contributor

enirolf commented Aug 12, 2025

I'm wondering if for completeness we should also add some of the existing tests as MT tests 🤔

Yes, I wasn't sure either. We're of course massively benefiting that we use the exact same model creation and filling code that you already wrote all the tests for. Additionally, we have the challenge of MT scheduling, so even the best test case will have to deal with non-determinism and potentially still not test the relevant thing because all events end up on a single thread...

That's fair, not to forget about the fact that the behavior of the parallel writer is already tested in isolation. I'm okay with leaving this like this then!

Copy link
Contributor

@enirolf enirolf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Would probably be good to get a second approval from someone else as well :)

Copy link
Contributor

@jblomer jblomer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Looks good to me but I'll leave approval to RDF owners.

Copy link
Member

@vepadulano vepadulano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks a lot! I have one question left, does not need to be addressed in this PR specifically.

@hahnjo hahnjo force-pushed the df-ntuple-snapshot-mt branch from a0e8be9 to bd8b5ae Compare September 1, 2025 08:26
Switch the existing code to use the RNTupleParallelWriter with one
RNTupleFillContext per slot. For sequential snapshotting, this
should be (almost) as efficient as the RNTupleWriter (one additional
cloned RNTupleModel for the only fill context), but save quite a bit
of code duplication and in testing effort.
Use the same conditions as TTree, looking at fOutputFile instead of
the data source.
@hahnjo hahnjo merged commit 9822a37 into root-project:master Sep 1, 2025
38 of 46 checks passed
@hahnjo hahnjo deleted the df-ntuple-snapshot-mt branch September 1, 2025 11:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants