Reduce amount of code JITted by RDataFrame #17282

gpetruc · 2024-12-16T15:28:35Z

This PR

This is some code, that could also be improved further, to reduce the per-sample JIT cost in RDataFrame, and potentially a way to avoid the "controlled memory leaks" #15520 (though this code currently does not yet address them in full).

I don't propose to merge this code as is, but I'm opening the PR as a reference for future discussion

Before this PR

Currently JITed nodes of the computation graph, RDF JITs the function code just once, but JITs code lines to create the computational graph once per graph, with code like

ROOT::Internal::RDF::JitDefineHelper<ROOT::Internal::RDF::DefineTypes::RDefineTag>(
        R_rdf::func45, 
        new const char*[3]{"Jet_isSelected", "Jet_area", "Jet_pt"}, 3, 
        "SelJet_area",
        reinterpret_cast<ROOT::Detail::RDF::RLoopManager*>(0x562b52fe7cb0),
        reinterpret_cast<std::weak_ptr<ROOT::Detail::RDF::RJittedDefine>*>(0x562b5530d120),
        reinterpret_cast<ROOT::Internal::RDF::RColumnRegister*>(0x562b552e1900),
        reinterpret_cast<std::shared_ptr<ROOT::Detail::RDF::RNodeBase>*>(0x562b552e0d20));

This is done by methods like BookDefineJit that craft the code as a string and pass it to RLoopManager::toJitExec. Calls are accumulated, and executed in one go by RLoopManager::Jit()

With this PR

Methods like BookDefineJit now call a new function RLoopManager::RegisterJitHelperCall and the whole operation is refactored in three steps

a definition via JIT of a registrator function, done once per function, kind of action (e.g. Define) and column name

namespace R_rdf {
  void jitNodeRegistrator_70(
        ROOT::Detail::RDF::RLoopManager *lm, 
        std::shared_ptr<ROOT::Detail::RDF::RNodeBase> *prevNodeOnHeap,
        ROOT::Internal::RDF::RColumnRegister* colRegister, 
        const std::vector<std::string> & colNames, 
        void *wkJittedDefine, void *) {
                 ROOT::Internal::RDF::JitDefineHelper<ROOT::Internal::RDF::DefineTypes::RDefineTag>(R_rdf::func45,
                 colNames, 
                 "SelJet_area", 
                 lm, reinterpret_cast<std::weak_ptr<ROOT::Detail::RDF::RJittedDefine>*>(wkJittedDefine),
                 colRegister, 
                 prevNodeOnHeap);
        }
}

lookup of the function address in memory, via calls to gInterpreter
from C++ call the registrator function once per computation graph

Similarly to the current setup, the declarations from point (1) are accumulated in a single string and passed to the interpreter in one go by RLoopManager::Jit(), and the address lookup (2) is also done inside RLoopManager::Jit()

The deferred function calls (3) are done both in RLoopManager::Jit() and also in RLoopManager::Run in order to also allow doing them in multiple threads when RunGraphs is used (RunGraphs calls Jit only on one of the loop managers, and then calls Run on all multithreaded)

What next?

I tested the code on several RDF analyses, but it could be tested more extensively, and also not just on x86_64.
The column name could also be gathered in the registration, to reduce the number of registrator functions to JIT
The objects created on the heap by BookDefineJit and deleted in the JITed calls like JitDefineHelper could be instead owned by the deferred function call object via RAII
Code could be cleaned further, e.g. the registration key could become an integer and the map a concurrent container to avoid the read lock, a c++ std::variant could also be used instead of a void * for the weak pointer, the extra column names for variations could be in a smart pointer, ...

vepadulano · 2024-12-18T09:00:46Z

Thank you @gpetruc for kickstarting this absolutely non-trivial change to a core part of RDataFrame! It is definitely worth discussing, I will take a look at this.

github-actions · 2024-12-20T03:26:01Z

Test Results

22 files 22 suites 3d 12h 35m 30s ⏱️
3 810 tests 3 810 ✅ 0 💤 0 ❌
76 712 runs 76 712 ✅ 0 💤 0 ❌

Results for commit c91b372.

♻️ This comment has been updated with latest results.

vepadulano · 2025-11-29T17:05:28Z

I've taken the liberty of rebasing this PR on master. Locally, I have tried to exploit these changes as a baseline to remove the controlled leaks of #15520 , but I've hit a blocker because of CallBuildAction where not only the jitted action pointer, the column register pointer and the previous node pointer are deleted, but also the pointer to the HelperArgType instance which does not fit in the workflow of this PR (it would require some sort of template DeferredJitCall or another way to make it agnostic. Before proceeding, I'd like to see what's missing for this PR to be merged, then possibly tackle the leaks in another PR

tree/dataframe/src/RLoopManager.cxx

vepadulano · 2025-12-04T18:45:00Z

@gpetruc I have squashed your proposed changes in one commit, the second commit introduces improvements that also lead to fixing #15520. Our CI is running well with the changes, let me know if you have time to check this PR again with your analysis, I'll be running some tests with valgrind locally as well.

gpetruc · 2025-12-08T16:17:13Z

Hi @vepadulano,

Thanks for reviving and updating this PR, I've run a quick check with one of my analysis benchmarks and it looks ok.

I'm failing to do more tests because for reasons I don't yet understand the el9 docker containers I'm building with this root branch or the master one fail to access eos ( Plugin version SecClnt v5.8.0 is incompatible with seckrb5 v5.9.1 (must be <= 5.8.x) in sec.protocol libXrdSeckrb5-5.so )

gpetruc · 2025-12-10T08:05:36Z

Okay, after fixing my docker builds now I managed to run also more tests with this PR, and confirm that results look ok.

These changes introduce a refactoring of the way JIT calls for RDataFrame nodes are handled. Previously, every time a certain node creation corresponded to a JIT call, RDataFrame would declare a line of code to JIT that node. This would possibly lead to calling the same JIT code multiple times, if the same function was to be JITted by many computation graphs. This commit introduces the concept of a "deferred JIT call", which refactors the approach in three steps: 1. Every time a new node needs to be JITted, RDataFrame will declare a registrator function, in the body of which the true function to JIT will be called. The registrator is identified by a sequentially increasing number, together with its whole function body. 2. In RLoopManager::Jit, a cache is accessed to retrieve the function pointers of the registrator functions. 3. Run the registrators, once per computation graph. This practically achieves a JIT reduction by disambiguating the same registrator function that can be reused by multiple computation graphs. The deferred function calls (3) are done both in RLoopManager::Jit() and also in RLoopManager::Run in order to also allow doing them in multiple threads when RunGraphs is used (RunGraphs calls Jit only on one of the loop managers, and then calls Run on all multithreaded).

vepadulano · 2026-01-09T08:19:29Z

Tested the changes of this PR with our rootbench RDataFrame benchmarks at https://github.com/root-project/rootbench/blob/master/root/tree/dataframe/RDataFrameBenchmarks.cxx . On my machine, the overall execution time for all the benchmarks lowers:

master:

 1/21 Test  #1: rootbench-RDFBenchmarks .............................................   Passed   57.18 sec

patch:

 1/21 Test  #1: rootbench-RDFBenchmarks .............................................   Passed   41.73 sec

In particular, certain cases see a very large improvement. This example with JITting of a large graph (O(10K) JIT nodes with reusage of the same 2 functions) sees a factor 2 reduction, going from 14 seconds to 7 seconds.

master

1: BM_RDataFrame_JitLargeGraphs/10000                              14413906 us     14411597 us            1

patch:

1: BM_RDataFrame_JitLargeGraphs/10000                                723228 us       723064 us            1

I tried offline and the difference would be even more dramatic if the computation graph functions would be reused not only within the same computation graph, but also across computation graphs, which is the main purpose of this PR.

I attach the CSV outputs from rootbench in case anyone is interested

patch.csv
master.csv

I believe this PR is good for a final review, adding the other RDF devs @martamaja10 @hageboeck

tree/dataframe/inc/ROOT/RDF/RInterfaceBase.hxx

tree/dataframe/inc/ROOT/RDF/RLoopManager.hxx

tree/dataframe/src/RDFInterfaceUtils.cxx

tree/dataframe/inc/ROOT/RDF/RLoopManager.hxx

tree/dataframe/src/RJittedAction.cxx

tree/dataframe/src/RLoopManager.cxx

vepadulano · 2026-01-15T14:05:14Z

@hageboeck I have addressed your comments in a separate commit, will squash later on 👍

vepadulano · 2026-01-15T16:07:28Z

Following offline discussion, here is an example showing the difference in terms of amount of code seen by the interpreter before/after this PR.

Take the following code

cat jit_large_graphs.cpp
#include <ROOT/RDataFrame.hxx>
#include <ROOT/RLogger.hxx>
#include <iostream>
#include <vector>
#include <string>

auto verbosity = ROOT::RLogScopedVerbosity(ROOT::Detail::RDF::RDFLogChannel(), ROOT::ELogLevel::kDebug + 10);

void jit_large_graph()
{
    ROOT::RDataFrame root(1);
    std::vector<ROOT::RDF::RResultPtr<double>> dfs;
    const auto n = 2;
    for (auto i = 0u; i < n; i++) {
        const auto column = "x" + std::to_string(i);
        auto define = root.Define(column, "42.f");
        auto filter = define.Filter(column + " > 0.f");
        auto sum = filter.Sum(column);
        dfs.emplace_back(sum);
    }
    // Trigger event loop
    dfs[0].GetValue();
}

int main()
{
  // Run twice to see the benefit of reusing the same JITted node registrators
  jit_large_graph();
  jit_large_graph();
}

Executing the code will print, among other things, the code that is JITted.

When running this program with ROOT master, we can see (just extracting interesting parts here)

// These lines refer to the actual function bodies that are JITted
namespace R_rdf {
auto func0(){return 42.f
;}
using func0_ret_t = typename ROOT::TypeTraits::CallableTraits<decltype(func0)>::ret_type;
}
namespace R_rdf {
auto func1(const float var0){return var0 > 0.f
;}
using func1_ret_t = typename ROOT::TypeTraits::CallableTraits<decltype(func1)>::ret_type;
}

These come from the strings passed to the Define and Filter in the application. Then, there are the lines used to actually instantiate the JITted nodes:

ROOT::Internal::RDF::JitDefineHelper<ROOT::Internal::RDF::DefineTypes::RDefineTag>(R_rdf::func0, new const char*[0]{}, 0, "x0", reinterpret_cast<ROOT::Detail::RDF::RLoopManager*>(0x33547580), reinterpret_cast<std::weak_ptr<ROOT::Detail::RDF::RJittedDefine>*>(0x346a0070), reinterpret_cast<ROOT::Internal::RDF::RColumnRegister*>(0x345225e0), reinterpret_cast<std::shared_ptr<ROOT::Detail::RDF::RNodeBase>*>(0x33547ce0));
ROOT::Internal::RDF::JitFilterHelper(R_rdf::func1, new const char*[1]{"x0"}, 1, "", reinterpret_cast<std::weak_ptr<ROOT::Detail::RDF::RJittedFilter>*>(0x3469a160), reinterpret_cast<std::shared_ptr<ROOT::Detail::RDF::RNodeBase>*>(0x346a03b0),reinterpret_cast<ROOT::Internal::RDF::RColumnRegister*>(0x346ab120));
ROOT::Internal::RDF::CallBuildAction<ROOT::Internal::RDF::ActionTags::Sum, float>(reinterpret_cast<std::shared_ptr<ROOT::Detail::RDF::RNodeBase>*>(0x345225c0), new const char*[1]{"x0"}, 1, 1, reinterpret_cast<shared_ptr<double>*>(0x3469a2a0), reinterpret_cast<std::weak_ptr<ROOT::Internal::RDF::RJittedAction>*>(0x3469cbf0), reinterpret_cast<ROOT::Internal::RDF::RColumnRegister*>(0x3469af60));ROOT::Internal::RDF::JitDefineHelper<ROOT::Internal::RDF::DefineTypes::RDefineTag>(R_rdf::func0, new const char*[0]{}, 0, "x1", reinterpret_cast<ROOT::Detail::RDF::RLoopManager*>(0x33547580), reinterpret_cast<std::weak_ptr<ROOT::Detail::RDF::RJittedDefine>*>(0x3469a4e0), reinterpret_cast<ROOT::Internal::RDF::RColumnRegister*>(0x346ab720), reinterpret_cast<std::shared_ptr<ROOT::Detail::RDF::RNodeBase>*>(0x3469a980));
ROOT::Internal::RDF::JitFilterHelper(R_rdf::func1, new const char*[1]{"x1"}, 1, "", reinterpret_cast<std::weak_ptr<ROOT::Detail::RDF::RJittedFilter>*>(0x3469a140), reinterpret_cast<std::shared_ptr<ROOT::Detail::RDF::RNodeBase>*>(0x346a57b0),reinterpret_cast<ROOT::Internal::RDF::RColumnRegister*>(0x346b47b0));
ROOT::Internal::RDF::CallBuildAction<ROOT::Internal::RDF::ActionTags::Sum, float>(reinterpret_cast<std::shared_ptr<ROOT::Detail::RDF::RNodeBase>*>(0x346a54a0), new const char*[1]{"x1"}, 1, 1, reinterpret_cast<shared_ptr<double>*>(0x346a56d0), reinterpret_cast<std::weak_ptr<ROOT::Internal::RDF::RJittedAction>*>(0x346a53f0), reinterpret_cast<ROOT::Internal::RDF::RColumnRegister*>(0x346b4aa0));

These lines are repeated for every new RDataFrame.

Instead, when running the same program after this PR, we have

namespace R_rdf {
auto func0(){return 42.f
;}
using func0_ret_t = typename ROOT::TypeTraits::CallableTraits<decltype(func0)>::ret_type;
}
namespace R_rdf {
auto func1(const float var0){return var0 > 0.f
;}
using func1_ret_t = typename ROOT::TypeTraits::CallableTraits<decltype(func1)>::ret_type;
}

These are the same as before. Then we have

namespace R_rdf {
  void jitNodeRegistrator_0(const std::vector<std::string> &colNames, ROOT::Internal::RDF::RColumnRegister &colRegister, ROOT::Detail::RDF::RLoopManager &lm, void *jittedDefine, std::shared_ptr<void> *) {
   ROOT::Internal::RDF::JitDefineHelper<ROOT::Internal::RDF::DefineTypes::RDefineTag>(R_rdf::func0, colNames, colRegister, lm, reinterpret_cast<ROOT::Detail::RDF::RJittedDefine *>(jittedDefine));
}
namespace R_rdf {
  void jitNodeRegistrator_1(const std::vector<std::string> &colNames, ROOT::Internal::RDF::RColumnRegister &colRegister, ROOT::Detail::RDF::RLoopManager &lm, void *jittedFilter, std::shared_ptr<void> *) {
   ROOT::Internal::RDF::JitFilterHelper(R_rdf::func1, colNames, colRegister, lm, reinterpret_cast<ROOT::Detail::RDF::RJittedFilter*>(jittedFilter));
}
namespace R_rdf {
  void jitNodeRegistrator_2(const std::vector<std::string> &colNames, ROOT::Internal::RDF::RColumnRegister &colRegister, ROOT::Detail::RDF::RLoopManager &lm, void *jittedAction, std::shared_ptr<void> *helperArg) {
   ROOT::Internal::RDF::CallBuildAction<ROOT::Internal::RDF::ActionTags::Sum, float>(colNames, colRegister, lm, reinterpret_cast<ROOT::Internal::RDF::RJittedAction *>(jittedAction), 1, reinterpret_cast<std::shared_ptr<double> *>(helperArg));
}

That is, there is one JIT node registrator line per function passed by the user (or per type of action, e.g. in the case of Sum). Afterwards, there is no other call to the interpreter for JITting, not even in the next computation graph.

I have repeated the test with n=1000 instead, to show the difference more evidently. I attach the extracted lines of code from each case:

$: cat extracted_master.txt | wc -l
4020
$: cat extracted_patch.txt | wc -l
35

extracted_patch.txt
extracted_master.txt

And just for completeness, here is the full list of commands to get to those results

g++ $(root-config --cflags --libs) -g -o jit_large_graphs_patch.out jit_large_graphs.cpp
./jit_large_graphs_patch.out 2>&1 | tee jit_large_graphs_patch.txt
grep -v "^Info in <\[ROOT.RDF\] " jit_large_graphs_patch.txt > extracted_patch.txt
cat extracted_patch.txt | wc -l

This commit leverages the newly created DeferredJitCall infrastructure in RLoopManager to avoid the need for propagating heap-allocated objects through the JITted nodes flow, thus fixing root-project#15520. * When creating a new JITted node, the (shared) ownership of the previous node is now passed to the JITted node itself, which will then release it to the concrete node after JITting. This in particular happens in JitFilterHelper, CallBuildAction. The DeferredJitCall objects have no knowledge about the previous node anymore. * The column register is owned by DeferredJitCall via std::unique_ptr. * DeferredJitCall now has shared ownership of the JITted node itself and eventually extra arguments passed to the call. Thanks to these changes, it is not necessary to call RLoopManager::Jit in the RDataFrame destructor anymore, thus that call is removed. A few improvements were added to the new logic that runs JIT calls lazily, namely: * The function id is now stored as a std::size_t value. * The function body is hashed before being inserted in the map.

When accumulating code to JIT for Define, Filter and Vary nodes, the current logic embeds the name of e.g. the new column to define as part of the function body that will go into the accumulated code to JIT. This in turn means that every time we call a Define, there will be a corresponding code to JIT, as well as a corresponding function body that will go into the newly created cache for the JIT node registrators. This can be optimised by avoiding to store the names as part of the function body, thus allowing a much more aggressive caching. In a simple example such as: ``` ROOT::RDataFrame root(1); std::vector<ROOT::RDF::RResultPtr<double>> dfs; const auto n = 10000; for (auto i = 0u; i < n; i++) { const auto column = "x" + std::to_string(i); auto define = root.Define(column, "42.f"); auto filter = define.Filter(column + " > 0.f"); auto sum = filter.Sum(column); dfs.emplace_back(sum); } dfs[n - 1].GetValue(); ``` This commit changes the previous situation that would have had 10K JIT function bodies, i.e. one per Define with a different column name, down to one function body to JIT for the lambda returning 42.f.

gpetruc requested review from martamaja10 and vepadulano as code owners December 16, 2024 15:28

dpiparo assigned vepadulano Dec 18, 2024

vepadulano force-pushed the rdf_jit_recall_forpr branch from 8379eb8 to 8136e27 Compare November 29, 2025 17:02

vepadulano force-pushed the rdf_jit_recall_forpr branch 3 times, most recently from 70d919c to e5f2ff9 Compare December 3, 2025 10:23

vepadulano reviewed Dec 3, 2025

View reviewed changes

tree/dataframe/src/RLoopManager.cxx Outdated Show resolved Hide resolved

vgvassilev reviewed Dec 3, 2025

View reviewed changes

tree/dataframe/src/RLoopManager.cxx Outdated Show resolved Hide resolved

tree/dataframe/src/RLoopManager.cxx Outdated Show resolved Hide resolved

vepadulano force-pushed the rdf_jit_recall_forpr branch from c677e78 to 3c3b1ee Compare December 4, 2025 12:50

vepadulano added the clean build Ask CI to do non-incremental build on PR label Jan 7, 2026

vepadulano closed this Jan 7, 2026

vepadulano reopened this Jan 7, 2026

vepadulano force-pushed the rdf_jit_recall_forpr branch from 3c3b1ee to 1a82815 Compare January 8, 2026 16:37

vepadulano requested a review from hageboeck January 9, 2026 08:19

vepadulano changed the title ~~[df] [for discussion] JIT graph creation functions only once~~ Reduce amount of code JITted by RDataFrame Jan 9, 2026

hageboeck reviewed Jan 13, 2026

View reviewed changes

hageboeck approved these changes Jan 15, 2026

View reviewed changes

vepadulano added 2 commits January 15, 2026 17:10

vepadulano force-pushed the rdf_jit_recall_forpr branch from d9356ec to c91b372 Compare January 15, 2026 16:11

vepadulano merged commit 1e4fcc8 into root-project:master Jan 16, 2026
49 of 53 checks passed

vepadulano linked an issue Jan 16, 2026 that may be closed by this pull request

Rewrite the RDataFrame JIT logic to avoid controlled leaks #15520

Closed

vepadulano mentioned this pull request Jan 16, 2026

Rewrite the RDataFrame JIT logic to avoid controlled leaks #15520

Closed

Reduce amount of code JITted by RDataFrame #17282

Reduce amount of code JITted by RDataFrame #17282

Uh oh!

Conversation

gpetruc commented Dec 16, 2024

This PR

Before this PR

With this PR

What next?

Uh oh!

vepadulano commented Dec 18, 2024

Uh oh!

github-actions bot commented Dec 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Results

Uh oh!

vepadulano commented Nov 29, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vepadulano commented Dec 4, 2025

Uh oh!

gpetruc commented Dec 8, 2025

Uh oh!

gpetruc commented Dec 10, 2025

Uh oh!

vepadulano commented Jan 9, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vepadulano commented Jan 15, 2026

Uh oh!

vepadulano commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

github-actions bot commented Dec 20, 2024 •

edited

Loading

vepadulano commented Jan 15, 2026 •

edited

Loading