Skip to content

Conversation

jackapet
Copy link

@jackapet jackapet commented Sep 25, 2025

This Pull request:

Add support for THnSparseD histograms in RDF

Changes or fixes:

  • Adding THnSparseDModel into HistoModels
  • Adding HistoNSparseD functions into RInterface
  • Remove deletion of copy constructor from THnSparse: THnSparse(const THnSparse&) = delete;

Checklist:

  • tested changes locally
  • updated the docs (if necessary)

Fixes #19969

@ferdymercury

This comment was marked as outdated.

@jackapet jackapet changed the title Add support for THnSparseD histogram in RDataFrame Fixes #19969 Sep 25, 2025
@jackapet jackapet changed the title Fixes #19969 Fixes Add support for THnSparseD histogram in RDataFrame #19969 Sep 25, 2025
@jackapet jackapet changed the title Fixes Add support for THnSparseD histogram in RDataFrame #19969 Fixes #19969 Sep 25, 2025
@jackapet jackapet changed the title Fixes #19969 Add support for THnSparseD histogram in RDataFrame Sep 25, 2025
@jackapet
Copy link
Author

Thanks @ferdymercury ,

I will try to follow your instructions.

I noticed in my local tests that the THnSparseD and THnD too do not work with multithreading. Both seems to work fine with a single thread.

@ferdymercury
Copy link
Collaborator

THnSparseD and THnD too do not work with multithreading

See https://root-forum.cern.ch/t/filling-histograms-in-parallel/35460

Copy link
Collaborator

@ferdymercury ferdymercury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot! Feel free to click on the button "Ready to Review" so that core developers are notified.

@jackapet
Copy link
Author

THnSparseD and THnD too do not work with multithreading

See https://root-forum.cern.ch/t/filling-histograms-in-parallel/35460

I see, thank you for advice.

@jackapet
Copy link
Author

I implemented all required changes. I have few comments/questions:

  • I am not able to run tests locally because I do not have a necessary software installed. Is there some docker image which I can use for development/testing?
  • In documentation, I was not able to determine links to lines in the html file, e.g. [HistoNSparseD](https://root.cern/doc/master/classROOT_1_1RDF_1_1RInterface.html#a5f3e2f0a3d1c8e4f0e2f3e7f0e8c6b7a). The link was generated by copilot an I do not know if it is right.

@jackapet jackapet marked this pull request as ready for review September 25, 2025 13:49
Copy link
Collaborator

@ferdymercury ferdymercury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concerning the link to the html file, leave it empty for the moment.
Thanks.

@ferdymercury
Copy link
Collaborator

One further comment: commit messages usually start with the involved component:

[math] fix sth
[RDF] improve Rdataframe...
[NFC] document better
(NFC is a signal for changes that are not relevant from the code point)

@jackapet
Copy link
Author

One further comment: commit messages usually start with the involved component:

[math] fix sth [RDF] improve Rdataframe... [NFC] document better (NFC is a signal for changes that are not relevant from the code point)

Ok, I added [RDF] or [NFC] to commits

@ferdymercury
Copy link
Collaborator

ferdymercury commented Sep 25, 2025

Thanks again!

In README/ReleaseNotes/v638/index.md
you could add:
- Add HistoNSparseD action that fills a sparse N-dimensional histogram.

  • You could add yourself as contributor on the list of that file.

nitpick wrt commit message: [hist] is usually small compared to [RDF] RDataFrame

@martamaja10
Copy link
Contributor

Hi @jackapet, thank you very much for this contribution! It looks good to me at the first glance, so I will let the CI run and once we see those results I will also give you a more detailed review. Thanks to @ferdymercury for giving the first reviews and hints as well!

@martamaja10 martamaja10 self-assigned this Sep 26, 2025
Copy link

github-actions bot commented Sep 26, 2025

Test Results

0 tests   0 ✅  0s ⏱️
0 suites  0 💤
0 files    0 ❌

Results for commit 1e296c0.

♻️ This comment has been updated with latest results.

@martamaja10
Copy link
Contributor

martamaja10 commented Sep 26, 2025

Hi @jackapet, so unfortunately there is an issue with your changes at the ROOT building stage. Have you tried building ROOT locally with all these changes? If you go through the logs of the CI workers, for example, you can see the following:

2025-09-26T10:28:40.3536950Z ##[error]/Users/sftnight/ROOT-CI/src/tree/dataframe/test/dataframe_histomodels.cxx:363:31: error: no matching constructor for initialization of '::THnSparseD' (aka 'THnSparseT<TArrayD>')
2025-09-26T10:28:40.3541740Z   363 |    auto h1e = d.HistoNSparseD(::THnSparseD("h1e", "h1e", 4, nbinse, edges), {"x0", "x1", "x2", "x3"});
2025-09-26T10:28:40.3643390Z       |                               ^            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2025-09-26T10:28:40.3746460Z /Users/sftnight/ROOT-CI/src/hist/hist/inc/THnSparse.h:72:4: note: candidate inherited constructor not viable: no known conversion from 'std::vector<std::vector<double>>' to 'const Double_t *' (aka 'const double *') for 5th argument
2025-09-26T10:28:40.3824260Z    72 |    THnSparse(const char* name, const char* title, Int_t dim,
2025-09-26T10:28:40.3855300Z       |    ^
2025-09-26T10:28:40.3878380Z    73 |              const Int_t* nbins, const Double_t* xmin = nullptr, const Double_t* xmax = nullptr,
2025-09-26T10:28:40.3879050Z       |                                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2025-09-26T10:28:40.3880090Z /Users/sftnight/ROOT-CI/src/hist/hist/inc/THnSparse.h:213:21: note: constructor from base class 'THnSparse' inherited here
2025-09-26T10:28:40.3880990Z   213 |    using THnSparse::THnSparse;
2025-09-26T10:28:40.3881340Z       |                     ^
2025-09-26T10:28:40.3882470Z /Users/sftnight/ROOT-CI/src/hist/hist/inc/THnSparse.h:75:4: note: candidate inherited constructor not viable: requires at most 4 arguments, but 5 were provided
2025-09-26T10:28:40.3883580Z    75 |    THnSparse(const char* name, const char* title,
2025-09-26T10:28:40.3884010Z       |    ^         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2025-09-26T10:28:40.3884430Z    76 |              const std::vector<TAxis>& axes,
2025-09-26T10:28:40.3884840Z       |              ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2025-09-26T10:28:40.3885230Z    77 |              Int_t chunksize = 1024 * 16);
2025-09-26T10:28:40.3885610Z       |              ~~~~~~~~~~~~~~~~~~~~~~~~~~~
2025-09-26T10:28:40.3886560Z /Users/sftnight/ROOT-CI/src/hist/hist/inc/THnSparse.h:213:21: note: constructor from base class 'THnSparse' inherited here
2025-09-26T10:28:40.3887390Z   213 |    using THnSparse::THnSparse;
2025-09-26T10:28:40.3887740Z       |                     ^
2025-09-26T10:28:40.3888920Z /Users/sftnight/ROOT-CI/src/hist/hist/inc/THnSparse.h:37:7: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 5 were provided
2025-09-26T10:28:40.3890820Z    37 | class THnSparse: public THnBase {
2025-09-26T10:28:40.3891190Z       |       ^~~~~~~~~
2025-09-26T10:28:40.3892050Z /Users/sftnight/ROOT-CI/src/hist/hist/inc/THnSparse.h:213:21: note: constructor from base class 'THnSparse' inherited here
2025-09-26T10:28:40.3892890Z   213 |    using THnSparse::THnSparse;
2025-09-26T10:28:40.3893260Z       |                     ^
2025-09-26T10:28:40.3894670Z /Users/sftnight/ROOT-CI/src/tree/dataframe/inc/ROOT/RDF/HistoModels.hxx:24:7: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 5 were provided
2025-09-26T10:28:40.3895860Z    24 | class THnSparseT;
2025-09-26T10:28:40.3896150Z       |       ^~~~~~~~~~
2025-09-26T10:28:40.3897450Z /Users/sftnight/ROOT-CI/src/tree/dataframe/inc/ROOT/RDF/HistoModels.hxx:24:7: note: candidate constructor (the implicit move constructor) not viable: requires 1 argument, but 5 were provided
2025-09-26T10:28:40.3898520Z    24 | class THnSparseT;
2025-09-26T10:28:40.3898660Z       |       ^~~~~~~~~~
2025-09-26T10:28:40.3899270Z /Users/sftnight/ROOT-CI/src/tree/dataframe/inc/ROOT/RDF/HistoModels.hxx:24:7: note: candidate constructor (the implicit default constructor) not viable: requires 0 arguments, but 5 were provided
2025-09-26T10:28:40.3899780Z 1 error generated.
2025-09-26T10:28:40.3900280Z make[2]: *** [tree/dataframe/test/CMakeFiles/dataframe_histomodels.dir/dataframe_histomodels.cxx.o] Error 1
2025-09-26T10:28:40.3900690Z make[1]: *** [tree/dataframe/test/CMakeFiles/dataframe_histomodels.dir/all] Error 2

This will need to be fixed first.

@jackapet
Copy link
Author

jackapet commented Sep 26, 2025

Hello @martamaja10 ,

I am able to compile code without tests.

I was not able to run or compile tests because I am not able to install required system packages.

I am using windows with wsl2. ROOT compilation fails with RVec compilation:

/mnt/c/cern/root_dev/build/include/ROOT/RVec.hxx:516:49: error: use of undeclared identifier 'R__HARDWARE_INTERFERENCE_SIZE'
   static constexpr std::size_t cacheLineSize = R__HARDWARE_INTERFERENCE_SIZE;

I guess wsl2 does not have this variable defined. Is there some workaround for this?

When running on cluster, I am able to compile the ROOT but I can't install libraries. So, I was able to check rdataframe in my local script but not in required tests.

Could you provide instructions how to compile tests? Do you have a singularity/docker image for developers with all required libraries installed? I can access cvmfs if it helps. Otherwise, I will try to build my own image.

@ferdymercury
Copy link
Collaborator

when calling cmake, did you enable flag -Dtesting=ON -Droottest=ON ?

(Building is enough, no need to install)

@ferdymercury
Copy link
Collaborator

The failure is likely because there is something equivalent to

THn::THn(const char *name, const char *title, Int_t dim, const Int_t *nbins,

that needs to be implemented in THnSparse.h and .cxx

@jackapet
Copy link
Author

when calling cmake, did you enable flag -Dtesting=ON -Droottest=ON ?

(Building is enough, no need to install)

Ok, thank you. I am trying to recompile it. I made an apptainer image with settings below. I hope all libraries are now installed.

Bootstrap: docker
From: ubuntu:24.04

%post
    apt-get -y update
    apt-get -y install binutils cmake dpkg-dev g++ gcc libssl-dev git libx11-dev libxext-dev libxft-dev libxpm-dev python3 libtbb-dev libvdt-dev libgif-dev
    apt-get -y install gfortran libpcre3-dev libglu1-mesa-dev libglew-dev libftgl-dev libfftw3-dev libcfitsio-dev libgraphviz-dev libavahi-compat-libdnssd-dev libldap2-dev python3-dev python3-numpy libxml2-dev libkrb5-dev libgsl-dev qtwebengine5-dev nlohmann-json3-dev libmysqlclient-dev libgl2ps-dev liblzma-dev libxxhash-dev liblz4-dev libzstd-dev
    apt-get -y install libgtest-dev libgmock-dev
    apt-get -y install python3-pytest

@ferdymercury
Copy link
Collaborator

Should I run some setup script for test? Should I execute them using root -x -q -b something.C? Or is there some executable or python script?

From the build directory, you can run

ctest -R root-io-filemerger

(specify the name or subset of test names you want to run e.g. root-io-*)

@jackapet
Copy link
Author

jackapet commented Sep 29, 2025

I run all tests beginning on ctest -R root-io-. Two tests were skipped. All other tests succeeded.

I also added one additional test into THn.cxx. This test checks all constructors. This test passes too.

About the failed test in the check_reducer_merger. This is related with multithreading. THnSparse cannot run in MT mode. The test fails with:

malloc_consolidate(): unaligned fastbin chunk detected
 *** Break *** abort
 Generating stack trace...
malloc_consolidate(): unaligned fastbin chunk detected
 *** Break *** abort

Should we remove this test?

@jackapet
Copy link
Author

@ferdymercury , @martamaja10 ,

I was checking THnSparse implementation. There is lot of manipulation with memory using new, malloc, memcpy, delete, and other operators. This is because the implementation is based on basic arrays like Int_t*.

I would suggest to rewrite it using std::unordered_map and std::vector classes. With this we can avoid any direct manipulation with memory. We can also implement it without using chunks, which are causing troubles, and leave optimization on std::unordered_map.

Error messages indicate that some internal object is deleted twice - in test dataframe cloning. It seems that some object was shared after cloning and two deletions happened when original and cloned objects were deleted.

TExMap fBinsContinued; ///<! Filled bins for non-unique hashes, containing pairs of (bin index 0, bin index 1)
THnSparseCompactBinCoord *fCompactCoord; ///<! Compact coordinate

THnSparse(const THnSparse&) = delete;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question, what is the reason for undeleting this constructor?
Could it be that this leads to the failures seen in the CI ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constructor is undeleted because it is required by RResultPtr. In other words, object without this constructor cannot be used by RDataFrame.

Perhaps, this constructor require explicit implementation.

@ferdymercury
Copy link
Collaborator

I would suggest to rewrite it using std::unordered_map and std::vector classes. With this we can avoid any direct manipulation with memory. We can also implement it without using chunks, which are causing troubles, and leave optimization on std::unordered_map.

I guess that since it's old code, the trend is to not touch it too much and apply minimal fixes to solve encountered issues/bugs. (Unless there is a significant performance improvement by switching to unordered_map).

Question: does the crash happen also before the additions of this PR? So to say, if you run valgrind --leak-check=full --suppressions=$ROOTSYS/etc/valgrind-root.supp root.exe -l mytest.C+
In other words, is the "unaligned fastbin chunk detected" crash coming from before the changes in the PR? Meaning that it's just that the new test added more coverage?
If the issue comes from multithreading, then I guess the only solution is to add a R__LOCKGUARD like the one you see in TChain.cxx, or to remove the multi-threaded test.

@jackapet
Copy link
Author

jackapet commented Sep 30, 2025

I would suggest to rewrite it using std::unordered_map and std::vector classes. With this we can avoid any direct manipulation with memory. We can also implement it without using chunks, which are causing troubles, and leave optimization on std::unordered_map.

I guess that since it's old code, the trend is to not touch it too much and apply minimal fixes to solve encountered issues/bugs. (Unless there is a significant performance improvement by switching to unordered_map).

Question: does the crash happen also before the additions of this PR? So to say, if you run valgrind --leak-check=full --suppressions=$ROOTSYS/etc/valgrind-root.supp root.exe -l mytest.C+ In other words, is the "unaligned fastbin chunk detected" crash coming from before the changes in the PR? Meaning that it's just that the new test added more coverage? If the issue comes from multithreading, then I guess the only solution is to add a R__LOCKGUARD like the one you see in TChain.cxx, or to remove the multi-threaded test.

I would suggest to rewrite it using std::unordered_map and std::vector classes. With this we can avoid any direct manipulation with memory. We can also implement it without using chunks, which are causing troubles, and leave optimization on std::unordered_map.

I guess that since it's old code, the trend is to not touch it too much and apply minimal fixes to solve encountered issues/bugs. (Unless there is a significant performance improvement by switching to unordered_map).

Question: does the crash happen also before the additions of this PR? So to say, if you run valgrind --leak-check=full --suppressions=$ROOTSYS/etc/valgrind-root.supp root.exe -l mytest.C+ In other words, is the "unaligned fastbin chunk detected" crash coming from before the changes in the PR? Meaning that it's just that the new test added more coverage? If the issue comes from multithreading, then I guess the only solution is to add a R__LOCKGUARD like the one you see in TChain.cxx, or to remove the multi-threaded test.

The crash is in new tests added in this request. All failures are related with THnSparse usage in RDataFrame. There are two things which does not work: running with multithreading and lazy cloning via ROOT::Internal::RDF::CloneResultAndAction (leads to double deletion).

Multithreading leads to failure in gtest-tree-dataframe-dataframe-merge-results and check_reducer_merge.
Cloning leads to failures in gtest-tree-dataframe-dataframe-cloning and gtest-tree-dataframe-dataframe-vary.

I will try to run with debugger to identify where exactly the code crash. Perhaps it will give some hints how to fix it.

@jackapet
Copy link
Author

I think I understand the issue now.

THnSparse has a raw pointer as a member variable. When we copy or clone the object we just copy the pointer. The pointer then points to the same location in the memory. This leads to crash when deleting original and cloned object. Bin contents can be also wrong. Issues in multithreading can be also caused by that, maybe.

So, we need an explicit implementation of the copy constructor (The one which is undeleted).

@ferdymercury
Copy link
Collaborator

So, we need an explicit implementation of the copy constructor (The one which is undeleted).

Good catch!

@jackapet
Copy link
Author

Explicit implementation fixed a problem with cloning and varying THnSparse histograms.

The problem with multithreading is still there. The implementation is very thread unsafe. There is a dynamic allocation of the memory and two threads may attempt to allocate memory at the same time. I suggest to remove this merger test. After this we can check pipeline again.

@ferdymercury
Copy link
Collaborator

Thanks for the fix!
@martamaja10 could you restart the CI ?

@pcanal
Copy link
Member

pcanal commented Sep 30, 2025

here is a dynamic allocation of the memory and two threads may attempt to allocate memory at the same time

'That' in itself is not usually an issue (malloc/operator new are thread safe). But indeed, first we need to see it work (including when running with valgrind) in single thread mode and then we can investigate the harder cases.

@jackapet jackapet force-pushed the Add-THnSparseDModel branch from 72b8033 to c061179 Compare October 1, 2025 20:13
@jackapet
Copy link
Author

jackapet commented Oct 1, 2025

There was still an issue in the copy constructor. Error logs pointed me that crashes happen when TObjArray destructor is called. When TObjArray is copied it creates only shallow copy. In merger test we make a copy and try to merge these copies. When destructors are called both TObjArrays (containing chunks) try to delete their contents and that leads to crash because only shallow copies were made and same objects are deleted twice.

I altered the copy constructor to make a deep copy. It seems that all tests works now. These errors with multithreading were misleading. RDataFrame probably made a copy of objects in the MT mode and that could cause that behavior. Now, I am able to fill histograms in the MT mode without crash. However, I am not sure if bin contents are reliable.

@ferdymercury
Copy link
Collaborator

However, I am not sure if bin contents are reliable.

Maybe a test that compares equality between resulting hist with and without MT enabled would be advisable here.
Alternatively, running valgrind / helgrind to detect races.

Thanks again for the effort and the fixes.

@jackapet
Copy link
Author

jackapet commented Oct 2, 2025

I run with multithreading and valgrind and all seems to be fine. Is there an appropriate file where we can place this test? EXPECT_EQ can be used in place of couts.

#include "ROOT/RDataFrame.hxx"

void test_multithread() {

   const int ncores = 100;
   const int nbins_per_axis = 10;
   const int nevents = 1000000;

   ROOT::EnableImplicitMT(ncores);

   ROOT::RDataFrame df{nevents};

   auto col1 = df.Define("x0", [&nbins_per_axis](ULong64_t e) { return double(e % nbins_per_axis); }, {"rdfentry_"});
   auto col2 = col1.Define("x1", [&nbins_per_axis](ULong64_t e) { return double(e % nbins_per_axis); }, {"rdfentry_"});
   auto col3 = col2.Define("x2", [&nbins_per_axis](ULong64_t e) { return double(e % nbins_per_axis); }, {"rdfentry_"});
   auto col4 = col3.Define("x3", [&nbins_per_axis](ULong64_t e) { return double(e % nbins_per_axis); }, {"rdfentry_"});

   int nbins[4] = {nbins_per_axis, nbins_per_axis, nbins_per_axis, nbins_per_axis};
   double xmin[4] = {0., 0., 0., 0.};
   double xmax[4] = {nbins_per_axis, nbins_per_axis, nbins_per_axis, nbins_per_axis};
   auto hist =
      col4.HistoNSparseD<double, double, double, double>({"name", "title", 4, nbins, xmin, xmax}, {"x0", "x1", "x2", "x3"});
     
   std::cout << hist->GetEntries() << std::endl;

   for (int i = 1; i <= nbins_per_axis; ++i)
   {
      std::vector<int> idx = {i, i, i, i};
      std::cout << "bin " << i << " = " << hist->GetBinContent(idx.data()) << " expected: " << nevents / nbins_per_axis << std::endl;
   }

}

Output looks fine:

Apptainer> valgrind --leak-check=full --suppressions=$ROOTSYS/etc/valgrind-root.supp root -q -b -l test_multithread.C+
==2728214== Memcheck, a memory error detector
==2728214== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==2728214== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
==2728214== Command: root -q -b -l test_multithread.C+
==2728214==

Processing test_multithread.C+...
Info in <TUnixSystem::ACLiC>: creating shared library /mnt/nfs17/jacka/root_dev/./test_multithread_C.so
1e+06
bin 1 = 100000 expected: 100000
bin 2 = 100000 expected: 100000
bin 3 = 100000 expected: 100000
bin 4 = 100000 expected: 100000
bin 5 = 100000 expected: 100000
bin 6 = 100000 expected: 100000
bin 7 = 100000 expected: 100000
bin 8 = 100000 expected: 100000
bin 9 = 100000 expected: 100000
bin 10 = 100000 expected: 100000
==2728214==
==2728214== HEAP SUMMARY:
==2728214==     in use at exit: 43 bytes in 1 blocks
==2728214==   total heap usage: 4 allocs, 3 frees, 73,857 bytes allocated
==2728214==
==2728214== LEAK SUMMARY:
==2728214==    definitely lost: 0 bytes in 0 blocks
==2728214==    indirectly lost: 0 bytes in 0 blocks
==2728214==      possibly lost: 0 bytes in 0 blocks
==2728214==    still reachable: 43 bytes in 1 blocks
==2728214==         suppressed: 0 bytes in 0 blocks
==2728214== Reachable blocks (those to which a pointer was found) are not shown.
==2728214== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==2728214==
==2728214== For lists of detected and suppressed errors, rerun with: -s
==2728214== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

@ferdymercury
Copy link
Collaborator

ferdymercury commented Oct 2, 2025

Thanks for the nice test!
Maybe it's worth adding it to tree/dataframe/test/dataframe_concurrency.cxx

valgrind --leak-check=full --suppressions=$ROOTSYS/etc/valgrind-root.supp root

When running with valgrind, you need to run it with "root.exe" instead of "root" (yep, even on linux).

Please run also a second (local) check using:
valgrind --tool=helgrind --suppressions=$ROOTSYS/etc/helgrind-root.supp

@jackapet
Copy link
Author

jackapet commented Oct 2, 2025

Here is a log from root.exe. It shows possibly lost blocks but not definitely lost blocks. Are such things expected?

out.txt

@ferdymercury
Copy link
Collaborator

Here is a log from root.exe. It shows possibly lost blocks but not definitely lost blocks. Are such things expected?

out.txt

Thanks. Yes, I think those issues are because the suppression file is outdated, and are not related to the changes in your PR, so you can ignore those possibly-lost things.

Please now run with helgrind and if you see no issues, then I think it's time to restart the CI.

@jackapet
Copy link
Author

jackapet commented Oct 2, 2025

I added the test into the dataframe_concurency.cxx.

The helgrind gives no error:

Apptainer> valgrind --tool=helgrind --suppressions=$ROOTSYS/etc/helgrind-root.supp root -x -q -l test_multithread.C+
==2912736== Helgrind, a thread error detector
==2912736== Copyright (C) 2007-2017, and GNU GPL'd, by OpenWorks LLP et al.
==2912736== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
==2912736== Command: root -x -q -l test_multithread.C+
==2912736==

Processing test_multithread.C+...
1e+06
bin 1 = 100000 expected: 100000
bin 2 = 100000 expected: 100000
bin 3 = 100000 expected: 100000
bin 4 = 100000 expected: 100000
bin 5 = 100000 expected: 100000
bin 6 = 100000 expected: 100000
bin 7 = 100000 expected: 100000
bin 8 = 100000 expected: 100000
bin 9 = 100000 expected: 100000
bin 10 = 100000 expected: 100000
==2912736==
==2912736== Use --history-level=approx or =none to gain increased speed, at
==2912736== the cost of reduced accuracy of conflicting-access information
==2912736== For lists of detected and suppressed errors, rerun with: -s
==2912736== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

@ferdymercury
Copy link
Collaborator

I guess helgrind with root.exe looks fine too?

@martamaja10 could you restart the CI ?

Thanks

@jackapet
Copy link
Author

jackapet commented Oct 2, 2025

Ah, sorry, I forgot on .exe. It does not look well with root.exe. It crashes. However, it does not seem to be related with THnSparseD.

Apptainer> valgrind --tool=helgrind --suppressions=$ROOTSYS/etc/helgrind-root.supp root.exe -x -q -l test_multithread.C+
==2926619== Helgrind, a thread error detector
==2926619== Copyright (C) 2007-2017, and GNU GPL'd, by OpenWorks LLP et al.
==2926619== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
==2926619== Command: root.exe -x -q -l test_multithread.C+
==2926619==

Processing test_multithread.C+...

valgrind: m_debuginfo/debuginfo.c:950 (truncate_DebugInfoMapping_overlaps): Assertion '!overlap' failed.

host stacktrace:
==2926619==    at 0x58027A0A: ??? (in /usr/libexec/valgrind/helgrind-amd64-linux)
==2926619==    by 0x58027B4F: ??? (in /usr/libexec/valgrind/helgrind-amd64-linux)
==2926619==    by 0x58027CE5: ??? (in /usr/libexec/valgrind/helgrind-amd64-linux)
==2926619==    by 0x5805A746: ??? (in /usr/libexec/valgrind/helgrind-amd64-linux)
==2926619==    by 0x5808A913: ??? (in /usr/libexec/valgrind/helgrind-amd64-linux)
==2926619==    by 0x58096D6B: ??? (in /usr/libexec/valgrind/helgrind-amd64-linux)
==2926619==    by 0x58086BA1: ??? (in /usr/libexec/valgrind/helgrind-amd64-linux)
==2926619==    by 0x580824B2: ??? (in /usr/libexec/valgrind/helgrind-amd64-linux)
==2926619==    by 0x580845C8: ??? (in /usr/libexec/valgrind/helgrind-amd64-linux)
==2926619==    by 0x580CFB07: ??? (in /usr/libexec/valgrind/helgrind-amd64-linux)

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable syscall 9 (lwpid 2926619)
==2926619==    at 0x4025D2C: __mmap64 (mmap64.c:58)
==2926619==    by 0x4025D2C: mmap (mmap64.c:46)
==2926619==    by 0x4007F78: _dl_map_segments (dl-map-segments.h:139)
==2926619==    by 0x4007F78: _dl_map_object_from_fd (dl-load.c:1258)
==2926619==    by 0x4009528: _dl_map_object (dl-load.c:2268)
==2926619==    by 0x400D8DB: dl_open_worker_begin (dl-open.c:578)
==2926619==    by 0x400151B: _dl_catch_exception (dl-catch.c:237)
==2926619==    by 0x400CD1F: dl_open_worker (dl-open.c:803)
==2926619==    by 0x400151B: _dl_catch_exception (dl-catch.c:237)
==2926619==    by 0x400D163: _dl_open (dl-open.c:905)
==2926619==    by 0x50AF1A3: dlopen_doit (dlopen.c:56)
==2926619==    by 0x400151B: _dl_catch_exception (dl-catch.c:237)
==2926619==    by 0x4001668: _dl_catch_error (dl-catch.c:256)
==2926619==    by 0x50AEC82: _dlerror_run (dlerror.c:138)
==2926619==    by 0x50AF25E: dlopen_implementation (dlopen.c:71)
==2926619==    by 0x50AF25E: dlopen@@GLIBC_2.34 (dlopen.c:81)
==2926619==    by 0x696AD57: cling::utils::platform::DLOpen(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*) (in /mnt/nfs17/jacka/root_dev/install/lib/libCling.so)
==2926619==    by 0x683FC44: cling::DynamicLibraryManager::loadLibrary(llvm::StringRef, bool, bool) (in /mnt/nfs17/jacka/root_dev/install/lib/libCling.so)
==2926619==    by 0x6744D1F: TCling::Load(char const*, bool) (in /mnt/nfs17/jacka/root_dev/install/lib/libCling.so)
==2926619==    by 0x4AB6C00: std::_Function_handler<bool (char const*), TSystem::CompileMacro(char const*, char const*, char const*, char const*, unsigned int)::{lambda(TString const&)#1}::operator()(TString const&) const::{lambda(char const*)#1}>::_M_invoke(std::_Any_data const&, char const*&&) (in /mnt/nfs17/jacka/root_dev/install/lib/libCore.so)
==2926619==    by 0x4ABF30C: TSystem::CompileMacro(char const*, char const*, char const*, char const*, unsigned int)::{lambda(char const*, std::function<bool (char const*)>)#1}::operator()(char const*, std::function<bool (char const*)>) const [clone .isra.0] (in /mnt/nfs17/jacka/root_dev/install/lib/libCore.so)
==2926619==    by 0x4AC3CA9: TSystem::CompileMacro(char const*, char const*, char const*, char const*, unsigned int) (in /mnt/nfs17/jacka/root_dev/install/lib/libCore.so)
==2926619==    by 0x674C3C9: TCling::ProcessLine(char const*, TInterpreter::EErrorCode*) (in /mnt/nfs17/jacka/root_dev/install/lib/libCling.so)
==2926619==    by 0x674CD73: TCling::ProcessLineSynch(char const*, TInterpreter::EErrorCode*) (in /mnt/nfs17/jacka/root_dev/install/lib/libCling.so)
==2926619==    by 0x4A3F056: TApplication::ExecuteFile(char const*, int*, bool) (in /mnt/nfs17/jacka/root_dev/install/lib/libCore.so)
==2926619==    by 0x487688F: TRint::ProcessLineNr(char const*, char const*, int*) (in /mnt/nfs17/jacka/root_dev/install/lib/libRint.so)
==2926619==    by 0x48784FC: TRint::Run(bool) (in /mnt/nfs17/jacka/root_dev/install/lib/libRint.so)
==2926619==    by 0x109302: main (in /mnt/nfs17/jacka/root_dev/install/bin/root.exe)
client stack range: [0x1FFEFF2000 0x1FFF000FFF] client SP: 0x1FFEFFA470
valgrind stack range: [0x1002E8E000 0x1002F8DFFF] top usage: 18232 of 1048576


Note: see also the FAQ in the source distribution.
It contains workarounds to several common problems.
In particular, if Valgrind aborted or crashed after
identifying problems in your program, there's a good chance
that fixing those problems will prevent Valgrind aborting or
crashing, especially if it happened in m_mallocfree.c.

If that doesn't help, please report this bug to: www.valgrind.org

In the bug report, send all the above text, the valgrind
version, and what OS and version you are using.  Thanks.

@ferdymercury
Copy link
Collaborator

Ah, sorry, I forgot on .exe. It does not look well with root.exe. It crashes. However, it does not seem to be related with THnSparseD.

Thanks. Let's the wait for a developer to start the CI.

Copy link
Collaborator

@ferdymercury ferdymercury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

congrats @jackapet the CI is green

@jackapet
Copy link
Author

jackapet commented Oct 3, 2025

Hello, that's great that tests works now.

I see that code formatting checks fails. Is there a way how I can fix it?

@ferdymercury
Copy link
Collaborator

Is there a way how I can fix it?

If you click on the clang-format and go to the very bottom, they explain you can do this on the already checked out branch:

git rebase -i -x "git-clang-format master && git commit -a --allow-empty --fixup=HEAD" --strategy-option=theirs origin/master

However, sometimes clang-format does too aggressive changes and it introduces too much noise in the PR, so I'd say let's wait to see if a developer requests this formatting change or not.

@martamaja10
Copy link
Contributor

Hi @jackapet. Thank you for your changes and congrats on having the green CI. (P.S. sorry for my slow replies, I was sick last week).

Regarding clang-format - it will be needed.

My suggestion is to first clean up the commits a bit using interactive rebase - for example, squash the commits of RDF tests into one commit and improve the commit from "Adding tests" to "Adding tests for THnSparse action in RDataFrame". Similarly for the documentation. For the THnSparse implementation in RDataFrame you could just have one commit (so for example: Adding THnSparseDModel, Adding chunk size and Adding interfaces to HistoNSparseD could become a single commit). In the commit message you can add a small description of what exactly you did. For now, leave the [hist] commits as they are.

Once you have this sorted, you will have only a few commits. Then you can use command:
git clang-format HEAD~1 etc to only format what you changed in the given commit, which should be clean. If clang finds something to format, you'll need to add those files and then do git commit --amend.

Petr Jacka added 2 commits October 7, 2025 09:35
THnSparseD constructors consistent

Copy constructor is inmplemented for THnSparseD
Making THn and THnSparseD constructors consistent
Adding test of THn and THnSparseD constructors
THnSparseD model is added; Interfaces for HistoNSparseD are added
Adding tests for HistoNSparseD including concurrency
@jackapet jackapet force-pushed the Add-THnSparseDModel branch from aa6b81b to 1e296c0 Compare October 7, 2025 07:42
@martamaja10
Copy link
Contributor

Hi @jackapet. Thanks for squashing the commits!

There are some issues with the CI - not because of your PR but we have just changed to LLVM 20 as of this morning. Could you rebase your branch to the current master and push your commits again?

Your local build after rebasing the master will also fail. What you can then do inside your build directory is:

 $ rm lib/*.pcm
 $ rm -rf interpreter/cling/tools/plugins/clad/

and then build again. If this doesn't work you might need to do a clean build from scratch.

Btw have you managed to do git clang-format?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for THnSparseD histogram in RDataFrame
4 participants