Skip to content

Conversation

@gpetruc
Copy link
Contributor

@gpetruc gpetruc commented Dec 16, 2024

This PR

This is some code, that could also be improved further, to reduce the per-sample JIT cost in RDataFrame, and potentially a way to avoid the "controlled memory leaks" #15520 (though this code currently does not yet address them in full).

I don't propose to merge this code as is, but I'm opening the PR as a reference for future discussion

Before this PR

Currently JITed nodes of the computation graph, RDF JITs the function code just once, but JITs code lines to create the computational graph once per graph, with code like

ROOT::Internal::RDF::JitDefineHelper<ROOT::Internal::RDF::DefineTypes::RDefineTag>(
        R_rdf::func45, 
        new const char*[3]{"Jet_isSelected", "Jet_area", "Jet_pt"}, 3, 
        "SelJet_area",
        reinterpret_cast<ROOT::Detail::RDF::RLoopManager*>(0x562b52fe7cb0),
        reinterpret_cast<std::weak_ptr<ROOT::Detail::RDF::RJittedDefine>*>(0x562b5530d120),
        reinterpret_cast<ROOT::Internal::RDF::RColumnRegister*>(0x562b552e1900),
        reinterpret_cast<std::shared_ptr<ROOT::Detail::RDF::RNodeBase>*>(0x562b552e0d20));

This is done by methods like BookDefineJit that craft the code as a string and pass it to RLoopManager::toJitExec. Calls are accumulated, and executed in one go by RLoopManager::Jit()

With this PR

Methods like BookDefineJit now call a new function RLoopManager::RegisterJitHelperCall and the whole operation is refactored in three steps

  1. a definition via JIT of a registrator function, done once per function, kind of action (e.g. Define) and column name
namespace R_rdf {
  void jitNodeRegistrator_70(
        ROOT::Detail::RDF::RLoopManager *lm, 
        std::shared_ptr<ROOT::Detail::RDF::RNodeBase> *prevNodeOnHeap,
        ROOT::Internal::RDF::RColumnRegister* colRegister, 
        const std::vector<std::string> & colNames, 
        void *wkJittedDefine, void *) {
                 ROOT::Internal::RDF::JitDefineHelper<ROOT::Internal::RDF::DefineTypes::RDefineTag>(R_rdf::func45,
                 colNames, 
                 "SelJet_area", 
                 lm, reinterpret_cast<std::weak_ptr<ROOT::Detail::RDF::RJittedDefine>*>(wkJittedDefine),
                 colRegister, 
                 prevNodeOnHeap);
        }
}
  1. lookup of the function address in memory, via calls to gInterpreter
  2. from C++ call the registrator function once per computation graph

Similarly to the current setup, the declarations from point (1) are accumulated in a single string and passed to the interpreter in one go by RLoopManager::Jit(), and the address lookup (2) is also done inside RLoopManager::Jit()

The deferred function calls (3) are done both in RLoopManager::Jit() and also in RLoopManager::Run in order to also allow doing them in multiple threads when RunGraphs is used (RunGraphs calls Jit only on one of the loop managers, and then calls Run on all multithreaded)

What next?

  • I tested the code on several RDF analyses, but it could be tested more extensively, and also not just on x86_64.
  • The column name could also be gathered in the registration, to reduce the number of registrator functions to JIT
  • The objects created on the heap by BookDefineJit and deleted in the JITed calls like JitDefineHelper could be instead owned by the deferred function call object via RAII
  • Code could be cleaned further, e.g. the registration key could become an integer and the map a concurrent container to avoid the read lock, a c++ std::variant could also be used instead of a void * for the weak pointer, the extra column names for variations could be in a smart pointer, ...

@vepadulano
Copy link
Member

Thank you @gpetruc for kickstarting this absolutely non-trivial change to a core part of RDataFrame! It is definitely worth discussing, I will take a look at this.

@github-actions
Copy link

github-actions bot commented Dec 20, 2024

Test Results

    22 files      22 suites   3d 12h 35m 30s ⏱️
 3 810 tests  3 810 ✅ 0 💤 0 ❌
76 712 runs  76 712 ✅ 0 💤 0 ❌

Results for commit c91b372.

♻️ This comment has been updated with latest results.

@vepadulano
Copy link
Member

I've taken the liberty of rebasing this PR on master. Locally, I have tried to exploit these changes as a baseline to remove the controlled leaks of #15520 , but I've hit a blocker because of CallBuildAction where not only the jitted action pointer, the column register pointer and the previous node pointer are deleted, but also the pointer to the HelperArgType instance which does not fit in the workflow of this PR (it would require some sort of template DeferredJitCall or another way to make it agnostic. Before proceeding, I'd like to see what's missing for this PR to be merged, then possibly tackle the leaks in another PR

@vepadulano vepadulano force-pushed the rdf_jit_recall_forpr branch 3 times, most recently from 70d919c to e5f2ff9 Compare December 3, 2025 10:23
@vepadulano vepadulano force-pushed the rdf_jit_recall_forpr branch from c677e78 to 3c3b1ee Compare December 4, 2025 12:50
@vepadulano
Copy link
Member

@gpetruc I have squashed your proposed changes in one commit, the second commit introduces improvements that also lead to fixing #15520. Our CI is running well with the changes, let me know if you have time to check this PR again with your analysis, I'll be running some tests with valgrind locally as well.

@gpetruc
Copy link
Contributor Author

gpetruc commented Dec 8, 2025

Hi @vepadulano,

Thanks for reviving and updating this PR, I've run a quick check with one of my analysis benchmarks and it looks ok.

I'm failing to do more tests because for reasons I don't yet understand the el9 docker containers I'm building with this root branch or the master one fail to access eos ( Plugin version SecClnt v5.8.0 is incompatible with seckrb5 v5.9.1 (must be <= 5.8.x) in sec.protocol libXrdSeckrb5-5.so )

@gpetruc
Copy link
Contributor Author

gpetruc commented Dec 10, 2025

Okay, after fixing my docker builds now I managed to run also more tests with this PR, and confirm that results look ok.

@vepadulano vepadulano added the clean build Ask CI to do non-incremental build on PR label Jan 7, 2026
@vepadulano vepadulano closed this Jan 7, 2026
@vepadulano vepadulano reopened this Jan 7, 2026
These changes introduce a refactoring of the way JIT calls for RDataFrame nodes are handled. Previously, every time a certain node creation corresponded to a JIT call, RDataFrame would declare a line of code to JIT that node. This would possibly lead to calling the same JIT code multiple times, if the same function was to be JITted by many computation graphs. This commit introduces the concept of a "deferred JIT call", which refactors the approach in three steps:
1. Every time a new node needs to be JITted, RDataFrame will declare a registrator function, in the body of which the true function to JIT will be called. The registrator is identified by a sequentially increasing number, together with its whole function body.
2. In RLoopManager::Jit, a cache is accessed to retrieve the function pointers of the registrator functions.
3. Run the registrators, once per computation graph.

This practically achieves a JIT reduction by disambiguating the same registrator function that can be reused by multiple computation graphs.

The deferred function calls (3) are done both in  RLoopManager::Jit() and also in RLoopManager::Run in order to also allow doing them in multiple threads when RunGraphs is used (RunGraphs calls Jit only on one of the loop managers, and then calls Run on all multithreaded).
@vepadulano vepadulano force-pushed the rdf_jit_recall_forpr branch from 3c3b1ee to 1a82815 Compare January 8, 2026 16:37
@vepadulano
Copy link
Member

Tested the changes of this PR with our rootbench RDataFrame benchmarks at https://github.com/root-project/rootbench/blob/master/root/tree/dataframe/RDataFrameBenchmarks.cxx . On my machine, the overall execution time for all the benchmarks lowers:

master:

 1/21 Test  #1: rootbench-RDFBenchmarks .............................................   Passed   57.18 sec

patch:

 1/21 Test  #1: rootbench-RDFBenchmarks .............................................   Passed   41.73 sec

In particular, certain cases see a very large improvement. This example with JITting of a large graph (O(10K) JIT nodes with reusage of the same 2 functions) sees a factor 2 reduction, going from 14 seconds to 7 seconds.

master

1: BM_RDataFrame_JitLargeGraphs/10000                              14413906 us     14411597 us            1

patch:

1: BM_RDataFrame_JitLargeGraphs/10000                                723228 us       723064 us            1

I tried offline and the difference would be even more dramatic if the computation graph functions would be reused not only within the same computation graph, but also across computation graphs, which is the main purpose of this PR.

I attach the CSV outputs from rootbench in case anyone is interested

patch.csv
master.csv

I believe this PR is good for a final review, adding the other RDF devs @martamaja10 @hageboeck

@vepadulano vepadulano requested a review from hageboeck January 9, 2026 08:19
@vepadulano vepadulano changed the title [df] [for discussion] JIT graph creation functions only once Reduce amount of code JITted by RDataFrame Jan 9, 2026
@vepadulano
Copy link
Member

@hageboeck I have addressed your comments in a separate commit, will squash later on 👍

@vepadulano
Copy link
Member

vepadulano commented Jan 15, 2026

Following offline discussion, here is an example showing the difference in terms of amount of code seen by the interpreter before/after this PR.

Take the following code

cat jit_large_graphs.cpp
#include <ROOT/RDataFrame.hxx>
#include <ROOT/RLogger.hxx>
#include <iostream>
#include <vector>
#include <string>

auto verbosity = ROOT::RLogScopedVerbosity(ROOT::Detail::RDF::RDFLogChannel(), ROOT::ELogLevel::kDebug + 10);

void jit_large_graph()
{
    ROOT::RDataFrame root(1);
    std::vector<ROOT::RDF::RResultPtr<double>> dfs;
    const auto n = 2;
    for (auto i = 0u; i < n; i++) {
        const auto column = "x" + std::to_string(i);
        auto define = root.Define(column, "42.f");
        auto filter = define.Filter(column + " > 0.f");
        auto sum = filter.Sum(column);
        dfs.emplace_back(sum);
    }
    // Trigger event loop
    dfs[0].GetValue();
}

int main()
{
  // Run twice to see the benefit of reusing the same JITted node registrators
  jit_large_graph();
  jit_large_graph();
}

Executing the code will print, among other things, the code that is JITted.

When running this program with ROOT master, we can see (just extracting interesting parts here)

// These lines refer to the actual function bodies that are JITted
namespace R_rdf {
auto func0(){return 42.f
;}
using func0_ret_t = typename ROOT::TypeTraits::CallableTraits<decltype(func0)>::ret_type;
}
namespace R_rdf {
auto func1(const float var0){return var0 > 0.f
;}
using func1_ret_t = typename ROOT::TypeTraits::CallableTraits<decltype(func1)>::ret_type;
}

These come from the strings passed to the Define and Filter in the application. Then, there are the lines used to actually instantiate the JITted nodes:

ROOT::Internal::RDF::JitDefineHelper<ROOT::Internal::RDF::DefineTypes::RDefineTag>(R_rdf::func0, new const char*[0]{}, 0, "x0", reinterpret_cast<ROOT::Detail::RDF::RLoopManager*>(0x33547580), reinterpret_cast<std::weak_ptr<ROOT::Detail::RDF::RJittedDefine>*>(0x346a0070), reinterpret_cast<ROOT::Internal::RDF::RColumnRegister*>(0x345225e0), reinterpret_cast<std::shared_ptr<ROOT::Detail::RDF::RNodeBase>*>(0x33547ce0));
ROOT::Internal::RDF::JitFilterHelper(R_rdf::func1, new const char*[1]{"x0"}, 1, "", reinterpret_cast<std::weak_ptr<ROOT::Detail::RDF::RJittedFilter>*>(0x3469a160), reinterpret_cast<std::shared_ptr<ROOT::Detail::RDF::RNodeBase>*>(0x346a03b0),reinterpret_cast<ROOT::Internal::RDF::RColumnRegister*>(0x346ab120));
ROOT::Internal::RDF::CallBuildAction<ROOT::Internal::RDF::ActionTags::Sum, float>(reinterpret_cast<std::shared_ptr<ROOT::Detail::RDF::RNodeBase>*>(0x345225c0), new const char*[1]{"x0"}, 1, 1, reinterpret_cast<shared_ptr<double>*>(0x3469a2a0), reinterpret_cast<std::weak_ptr<ROOT::Internal::RDF::RJittedAction>*>(0x3469cbf0), reinterpret_cast<ROOT::Internal::RDF::RColumnRegister*>(0x3469af60));ROOT::Internal::RDF::JitDefineHelper<ROOT::Internal::RDF::DefineTypes::RDefineTag>(R_rdf::func0, new const char*[0]{}, 0, "x1", reinterpret_cast<ROOT::Detail::RDF::RLoopManager*>(0x33547580), reinterpret_cast<std::weak_ptr<ROOT::Detail::RDF::RJittedDefine>*>(0x3469a4e0), reinterpret_cast<ROOT::Internal::RDF::RColumnRegister*>(0x346ab720), reinterpret_cast<std::shared_ptr<ROOT::Detail::RDF::RNodeBase>*>(0x3469a980));
ROOT::Internal::RDF::JitFilterHelper(R_rdf::func1, new const char*[1]{"x1"}, 1, "", reinterpret_cast<std::weak_ptr<ROOT::Detail::RDF::RJittedFilter>*>(0x3469a140), reinterpret_cast<std::shared_ptr<ROOT::Detail::RDF::RNodeBase>*>(0x346a57b0),reinterpret_cast<ROOT::Internal::RDF::RColumnRegister*>(0x346b47b0));
ROOT::Internal::RDF::CallBuildAction<ROOT::Internal::RDF::ActionTags::Sum, float>(reinterpret_cast<std::shared_ptr<ROOT::Detail::RDF::RNodeBase>*>(0x346a54a0), new const char*[1]{"x1"}, 1, 1, reinterpret_cast<shared_ptr<double>*>(0x346a56d0), reinterpret_cast<std::weak_ptr<ROOT::Internal::RDF::RJittedAction>*>(0x346a53f0), reinterpret_cast<ROOT::Internal::RDF::RColumnRegister*>(0x346b4aa0));

These lines are repeated for every new RDataFrame.

Instead, when running the same program after this PR, we have

namespace R_rdf {
auto func0(){return 42.f
;}
using func0_ret_t = typename ROOT::TypeTraits::CallableTraits<decltype(func0)>::ret_type;
}
namespace R_rdf {
auto func1(const float var0){return var0 > 0.f
;}
using func1_ret_t = typename ROOT::TypeTraits::CallableTraits<decltype(func1)>::ret_type;
}

These are the same as before. Then we have

namespace R_rdf {
  void jitNodeRegistrator_0(const std::vector<std::string> &colNames, ROOT::Internal::RDF::RColumnRegister &colRegister, ROOT::Detail::RDF::RLoopManager &lm, void *jittedDefine, std::shared_ptr<void> *) {
   ROOT::Internal::RDF::JitDefineHelper<ROOT::Internal::RDF::DefineTypes::RDefineTag>(R_rdf::func0, colNames, colRegister, lm, reinterpret_cast<ROOT::Detail::RDF::RJittedDefine *>(jittedDefine));
}
namespace R_rdf {
  void jitNodeRegistrator_1(const std::vector<std::string> &colNames, ROOT::Internal::RDF::RColumnRegister &colRegister, ROOT::Detail::RDF::RLoopManager &lm, void *jittedFilter, std::shared_ptr<void> *) {
   ROOT::Internal::RDF::JitFilterHelper(R_rdf::func1, colNames, colRegister, lm, reinterpret_cast<ROOT::Detail::RDF::RJittedFilter*>(jittedFilter));
}
namespace R_rdf {
  void jitNodeRegistrator_2(const std::vector<std::string> &colNames, ROOT::Internal::RDF::RColumnRegister &colRegister, ROOT::Detail::RDF::RLoopManager &lm, void *jittedAction, std::shared_ptr<void> *helperArg) {
   ROOT::Internal::RDF::CallBuildAction<ROOT::Internal::RDF::ActionTags::Sum, float>(colNames, colRegister, lm, reinterpret_cast<ROOT::Internal::RDF::RJittedAction *>(jittedAction), 1, reinterpret_cast<std::shared_ptr<double> *>(helperArg));
}

That is, there is one JIT node registrator line per function passed by the user (or per type of action, e.g. in the case of Sum). Afterwards, there is no other call to the interpreter for JITting, not even in the next computation graph.

I have repeated the test with n=1000 instead, to show the difference more evidently. I attach the extracted lines of code from each case:

$: cat extracted_master.txt | wc -l
4020
$: cat extracted_patch.txt | wc -l
35

extracted_patch.txt
extracted_master.txt

And just for completeness, here is the full list of commands to get to those results

g++ $(root-config --cflags --libs) -g -o jit_large_graphs_patch.out jit_large_graphs.cpp
./jit_large_graphs_patch.out 2>&1 | tee jit_large_graphs_patch.txt
grep -v "^Info in <\[ROOT.RDF\] " jit_large_graphs_patch.txt > extracted_patch.txt
cat extracted_patch.txt | wc -l

This commit leverages the newly created DeferredJitCall infrastructure in RLoopManager to avoid the need for propagating heap-allocated objects through the JITted nodes flow, thus fixing root-project#15520.

* When creating a new JITted node, the (shared) ownership of the previous node is now passed to the JITted node itself, which will then release it to the concrete node after JITting. This in particular happens in JitFilterHelper, CallBuildAction. The DeferredJitCall objects have no knowledge about the previous node anymore.
* The column register is owned by DeferredJitCall via std::unique_ptr.
* DeferredJitCall now has shared ownership of the JITted node itself and eventually extra arguments passed to the call.

Thanks to these changes, it is not necessary to call RLoopManager::Jit in the RDataFrame destructor anymore, thus that call is removed.

A few improvements were added to the new logic that runs JIT calls lazily, namely:
* The function id is now stored as a std::size_t value.
* The function body is hashed before being inserted in the map.
When accumulating code to JIT for Define, Filter and Vary nodes, the
current logic embeds the name of e.g. the new column to define as part
of the function body that will go into the accumulated code to JIT. This
in turn means that every time we call a Define, there will be a
corresponding code to JIT, as well as a corresponding function body that
will go into the newly created cache for the JIT node registrators.

This can be optimised by avoiding to store the names as part of the
function body, thus allowing a much more aggressive caching. In a simple
example such as:

```
ROOT::RDataFrame root(1);
std::vector<ROOT::RDF::RResultPtr<double>> dfs;
const auto n = 10000;
for (auto i = 0u; i < n; i++) {
    const auto column = "x" + std::to_string(i);
    auto define = root.Define(column, "42.f");
    auto filter = define.Filter(column + " > 0.f");
    auto sum = filter.Sum(column);
    dfs.emplace_back(sum);
}
dfs[n - 1].GetValue();
```

This commit changes the previous situation that would have had 10K JIT
function bodies, i.e. one per Define with a different column name, down
to one function body to JIT for the lambda returning 42.f.
@vepadulano vepadulano force-pushed the rdf_jit_recall_forpr branch from d9356ec to c91b372 Compare January 15, 2026 16:11
@vepadulano vepadulano merged commit 1e4fcc8 into root-project:master Jan 16, 2026
49 of 53 checks passed
@vepadulano vepadulano linked an issue Jan 16, 2026 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clean build Ask CI to do non-incremental build on PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rewrite the RDataFrame JIT logic to avoid controlled leaks

4 participants