[df] Support MT snapshotting to RNTuple #19599

hahnjo · 2025-08-11T09:27:47Z

Switch the existing code to use the RNTupleParallelWriter with one RNTupleFillContext per slot. For sequential snapshotting, this should be (almost) as efficient as the RNTupleWriter (one additional cloned RNTupleModel for the only fill context), but save quite a bit of code duplication and in testing effort.

github-actions · 2025-08-11T13:39:53Z

Test Results

20 files 20 suites 3d 19h 1m 23s ⏱️
3 645 tests 3 642 ✅ 0 💤 3 ❌
71 224 runs 71 211 ✅ 10 💤 3 ❌

For more details on these failures, see this check.

Results for commit 6fd522f.

♻️ This comment has been updated with latest results.

pcanal · 2025-08-11T18:54:21Z

(one additional cloned RNTupleModel for the only fill context)

I am confused by the wording of this part of the commit log. only suggest there is a single fill context for the whole process ... which is contradicted by one fill context per slot. What is means by the above sentence?

tree/dataframe/src/RDFSnapshotHelpers.cxx

hahnjo · 2025-08-12T06:41:49Z

(one additional cloned RNTupleModel for the only fill context)

I am confused by the wording of this part of the commit log. only suggest there is a single fill context for the whole process ... which is contradicted by one fill context per slot. What is means by the above sentence?

Agreed, it's not well formulated. What I'm trying to say is that for sequential snapshotting (that is already supported before the PR), changing from the RNTupleWriter to the RNTupleParallelWriter has only a slight overhead. I will revise the commit message.

enirolf

Really really nice! I'm wondering if for completeness we should also add some of the existing tests as MT tests 🤔

tree/dataframe/test/dataframe_snapshot_ntuple.cxx

hahnjo · 2025-08-12T13:30:39Z

I'm wondering if for completeness we should also add some of the existing tests as MT tests 🤔

Yes, I wasn't sure either. We're of course massively benefiting that we use the exact same model creation and filling code that you already wrote all the tests for. Additionally, we have the challenge of MT scheduling, so even the best test case will have to deal with non-determinism and potentially still not test the relevant thing because all events end up on a single thread...

enirolf · 2025-08-12T14:18:44Z

I'm wondering if for completeness we should also add some of the existing tests as MT tests 🤔

Yes, I wasn't sure either. We're of course massively benefiting that we use the exact same model creation and filling code that you already wrote all the tests for. Additionally, we have the challenge of MT scheduling, so even the best test case will have to deal with non-determinism and potentially still not test the relevant thing because all events end up on a single thread...

That's fair, not to forget about the fact that the behavior of the parallel writer is already tested in isolation. I'm okay with leaving this like this then!

enirolf

LGTM! Would probably be good to get a second approval from someone else as well :)

jblomer

Nice! Looks good to me but I'll leave approval to RDF owners.

vepadulano

LGTM, thanks a lot! I have one question left, does not need to be addressed in this PR specifically.

tree/dataframe/src/RDFSnapshotHelpers.cxx

... instead of the default entry. Then we also only need a bare model.

This is less expensive than string comparisons of field names during every call to Exec().

Switch the existing code to use the RNTupleParallelWriter with one RNTupleFillContext per slot. For sequential snapshotting, this should be (almost) as efficient as the RNTupleWriter (one additional cloned RNTupleModel for the only fill context), but save quite a bit of code duplication and in testing effort.

Use the same conditions as TTree, looking at fOutputFile instead of the data source.

hahnjo requested a review from enirolf August 11, 2025 09:27

hahnjo self-assigned this Aug 11, 2025

hahnjo added in:RDataFrame in:RNTuple labels Aug 11, 2025

hahnjo force-pushed the df-ntuple-snapshot-mt branch from ced6d97 to 8a09b36 Compare August 11, 2025 13:50

pcanal reviewed Aug 11, 2025

View reviewed changes

tree/dataframe/src/RDFSnapshotHelpers.cxx Show resolved Hide resolved

hahnjo marked this pull request as ready for review August 12, 2025 06:43

hahnjo requested review from martamaja10 and vepadulano as code owners August 12, 2025 06:43

hahnjo requested a review from jblomer August 12, 2025 06:44

enirolf reviewed Aug 12, 2025

View reviewed changes

tree/dataframe/test/dataframe_snapshot_ntuple.cxx Show resolved Hide resolved

hahnjo force-pushed the df-ntuple-snapshot-mt branch from 8a09b36 to a0e8be9 Compare August 12, 2025 13:20

enirolf approved these changes Aug 12, 2025

View reviewed changes

jblomer reviewed Aug 14, 2025

View reviewed changes

vepadulano approved these changes Aug 25, 2025

View reviewed changes

tree/dataframe/src/RDFSnapshotHelpers.cxx Show resolved Hide resolved

hahnjo added 5 commits September 1, 2025 10:13

[df] Remove unused member in RNTuple snapshotting

50d10e4

[df] Simplify member init in RNTuple snapshotting

7ad5d75

[df] Reorder helper methods for RNTuple snapshotting

8aca0a6

[df] Use bare REntry for RNTuple snapshotting

924d5a2

... instead of the default entry. Then we also only need a bare model.

[df] Use field tokens for RNTuple snapshotting

30ce048

This is less expensive than string comparisons of field names during every call to Exec().

hahnjo force-pushed the df-ntuple-snapshot-mt branch from a0e8be9 to bd8b5ae Compare September 1, 2025 08:26

hahnjo added 2 commits September 1, 2025 10:27

[df] Fix warning about untriggered lazy RNTuple snapshot

6fd522f

Use the same conditions as TTree, looking at fOutputFile instead of the data source.

hahnjo force-pushed the df-ntuple-snapshot-mt branch from bd8b5ae to 6fd522f Compare September 1, 2025 08:27

hahnjo mentioned this pull request Sep 1, 2025

[df] RNTuple snapshot + TTree-specific options #19784

Open

hahnjo merged commit 9822a37 into root-project:master Sep 1, 2025
38 of 46 checks passed

hahnjo deleted the df-ntuple-snapshot-mt branch September 1, 2025 11:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[df] Support MT snapshotting to RNTuple #19599

[df] Support MT snapshotting to RNTuple #19599

Uh oh!

hahnjo commented Aug 11, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Aug 11, 2025 •

edited

Loading

Uh oh!

pcanal commented Aug 11, 2025

Uh oh!

Uh oh!

hahnjo commented Aug 12, 2025

Uh oh!

enirolf left a comment

Uh oh!

Uh oh!

hahnjo commented Aug 12, 2025

Uh oh!

enirolf commented Aug 12, 2025

Uh oh!

enirolf left a comment

Uh oh!

jblomer left a comment

Uh oh!

vepadulano left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[df] Support MT snapshotting to RNTuple #19599

[df] Support MT snapshotting to RNTuple #19599

Uh oh!

Conversation

hahnjo commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Results

Uh oh!

pcanal commented Aug 11, 2025

Uh oh!

Uh oh!

hahnjo commented Aug 12, 2025

Uh oh!

enirolf left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hahnjo commented Aug 12, 2025

Uh oh!

enirolf commented Aug 12, 2025

Uh oh!

enirolf left a comment

Choose a reason for hiding this comment

Uh oh!

jblomer left a comment

Choose a reason for hiding this comment

Uh oh!

vepadulano left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hahnjo commented Aug 11, 2025 •

edited

Loading

github-actions bot commented Aug 11, 2025 •

edited

Loading