Skip to content

Conversation

KohlerHECTOR
Copy link

@KohlerHECTOR KohlerHECTOR commented Jun 26, 2025

First commit to add DPDT to the TabArena.

We committed the skeleton of a class.
The paper can be found here .
The original code for DPDT is here and passes the sklearn tests for a BaseEstimator .

  • Add configs
  • Do tests in tst
  • Add the Boosted version of DPDT
  • Implement the paper formula for memory consumption estimate

@KohlerHECTOR KohlerHECTOR changed the title [WIP][New Model] Dynammic Programming Decision Trees [WIP][New Model] Dynamic Programming Decision Trees Jun 26, 2025
@LennartPurucker LennartPurucker self-assigned this Jun 26, 2025
@LennartPurucker
Copy link
Collaborator

Great to see the start of the contribution here @KohlerHECTOR :)

Please ping me for any questions or once I should take a look at the code.
I am also happy to step in and do some of the implementation myself once you have a starting point.

One thought on the topic: can your DT do predic_proba in a meaningful way?
The default sklearn DT does not support this well, and we will be evaluating with respect to the prediction probabilities in classification, so that might be very influential.

Also, I am happy to run the HPO myself once the PR is completed. I have access to free compute for this purpose.

@KohlerHECTOR
Copy link
Author

Hello, it does not support predict proba unfortunately.
Maybe I should just focus on the boosted version of DPDT that does.
For the HPO, I have some configs to launch it. I will commit them soon.

@LennartPurucker
Copy link
Collaborator

Maybe I should just focus on the boosted version of DPDT that does.

Yea, that sounds like a good idea. Or an RF version of it 🤔

@KohlerHECTOR
Copy link
Author

I ll do that then :) thanks for the halp. Even though it is too bad as decision tree algos are known to perform well on tab data :) could be nice to make the benchmark compatible with them : ) .

Copy link
Author

@KohlerHECTOR KohlerHECTOR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @LennartPurucker . I just finished implementing BoostedDPDT as an AG abstract model. I have also added a search space for BoostedDPDT hps. This should be compatible with predict proba. What should I do next :) ?

@LennartPurucker
Copy link
Collaborator

Great, thank you!

Give me some time to review and test your code, and I will get back to you with questions if I have any.
Then I can run a first benchmark!

I just need a bit of time as my week is very full

def _fit(self, X: pd.DataFrame, y: pd.Series, num_cpus: int = 1, **kwargs):
model_cls = self.get_model_cls()
hyp = self._get_model_params()
if num_cpus < 1:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think num_cpus would never be below 1, did you want to do <=?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, I will remove it. It is just by experience with the joblib library in which to use all available cpus one write n_jobs = -1

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes. Here num_cpus might be a string called "auto" in edge cases (not within TabArena benchmarks)

@LennartPurucker
Copy link
Collaborator

Code looks very clean, nice :)

Two questions that might have a large impact on performance:

  1. Could you code support a time limit? E.g., could you add logic such that we pass a time limit in seconds and boosting stops once the time limit is reached?
  2. Do you somehow want to support using the validation data for early stopping boosting? This is usually very powerful. Alternative, we might want to decide to refit after cross-validation.

@KohlerHECTOR
Copy link
Author

KohlerHECTOR commented Jun 30, 2025

Hello, Thanks for the feedback. No hurry, let us take it slow.

I have already benchmarked Boosted-DPDT on clusters for medium and large datasets and in my experience the key time/memory bottlenecks are the train set size. What is the usual train-test-validation split like in TabArena ?

I think Boosted-DPDT could support a time limit but not very fine grain, i.e. the time limit could be something like; if time < time_limit: add estimator to ensemble; else: stop . I would say that training a single BoostedDPDT estimator with 1000 single DPDT estimators in the ensemble takes approximately 2 CPU hours.

I did not optimize DPDT inference yet, especially not for boosting, so doing inference on validation for early stoppng should not be a good idea at the moment.

@LennartPurucker
Copy link
Collaborator

What is the usual train-test-validation split like in TabArena ?

We have training data from 500 to 100k samples with up to 2000 features, which should drive the training cost mostly.

if time < time_limit: add estimator to ensemble; else: stop

This would be exactly what would be needed. One could also add a check to see if one would have enough time to run another iteration.

I did not optimize DPDT inference yet

Note that a longer inference time will also make the model "training" slower, as we are cross-validating the models, which is factored into the training time.

Early stopping on the validation data is mostly important to obtain peak performance.

@KohlerHECTOR
Copy link
Author

Hello, I have updated the source code of AdaBoostDPDT to take in a time_limit: https://github.com/KohlerHECTOR/DPDTreeEstimator/blob/d097719eb8d6299fc38ef901275cf8f3fdb0a598/dpdt/boosted_dpdt.py#L160C9-L166C26 .
I have also removed a factor ~=2 in the inference time.

@LennartPurucker
Copy link
Collaborator

Great! I think this should be all we need to run a first benchmark on TabArena-Lite.

I can likely start the runs early next week, as I am currently still running another model.
I will share the results here once I get them or if I encounter any issues while running the model.

@KohlerHECTOR
Copy link
Author

Great! I think this should be all we need to run a first benchmark on TabArena-Lite.

I can likely start the runs early next week, as I am currently still running another model. I will share the results here once I get them or if I encounter any issues while running the model.

Looking forward to hear back from you! It was nice working on this issue collaboratively.

@KohlerHECTOR
Copy link
Author

Hello @LennartPurucker . Did you manage to run the model? Is it running without errors ? :) Thx

@LennartPurucker
Copy link
Collaborator

Not yet, sorry :(

Quite busy debugging and running other models that were still on my TODOs, and I am at a conference right now.
I might get it done on the weekend.

@KohlerHECTOR
Copy link
Author

Not yet, sorry :(

Quite busy debugging and running other models that were still on my TODOs, and I am at a conference right now. I might get it done on the weekend.

No worries ! Good luck thanks :)

@LennartPurucker
Copy link
Collaborator

LennartPurucker commented Jul 19, 2025

Heyho, here are the results on TabArena, all classification tasks, only the default config:

image

I added some fixes to the code to make it run, but otherwise kept your code as is.
In general, the method was very quick. I can send you the full results artifacts if needed.

Here is the full LB .csv:

method,time_train_s,time_infer_s,time_train_s_per_1K,time_infer_s_per_1K,normalized-error,normalized-error-task,imputed,champ_delta,loss_rescaled,time_train_s_rescaled,time_infer_s_rescaled,rank,median_metric_error,median_time_train_s,median_time_infer_s,median_time_train_s_per_1K,median_time_infer_s_per_1K,median_normalized-error,median_normalized-error-task,median_imputed,median_champ_delta,median_loss_rescaled,median_time_train_s_rescaled,median_time_infer_s_rescaled,median_rank,rank=1_count,rank=2_count,rank=3_count,rank>3_count,elo,elo+,elo-,winrate,mrr
TABM_GPU (tuned + ensemble),34341.376551290836,9.60910518720136,4223.58680397795,3.4317121842487635,0.4860572295604615,0.5216110444529932,0.0,0.08239957044412496,0.042160505691113126,45864.929770826835,134.55326674312073,9.697368421052632,0.17047,8245.070318195554,2.6588550435172187,2466.2108648716858,1.5022465694891438,0.46406624968788623,0.5266138740891876,0.0,0.0374695276183053,0.018934386801786905,39661.48674821285,121.43882105497326,9.0,3,2,2,31,1569.4,26.1,36.7,0.8067251461988304,0.22347997681669599
AutoGluon 1.3 (4h),7918.9967654708535,21.9962719316371,2834.6060407147975,3.0873029091200266,0.41653155366951344,0.4223335783862739,0.0,0.06891082208685434,0.035238872091106904,31011.170971980166,263.893057970457,9.736842105263158,0.159155,7086.632828738954,3.4718479580349393,1322.7174946761893,2.35500290690449,0.333422352863713,0.4106381388922016,0.0,0.03226297629046343,0.01364222217043919,23372.022224152715,142.27612475951975,6.0,6,3,1,28,1568.8,39.1,27.3,0.8058479532163743,0.30088199590658893
REALMLP (tuned + ensemble),88520.97457650169,55.83305884453288,9129.519936677754,18.43403910125304,0.5211997145971525,0.5419324392599179,0.0,0.08216963958005759,0.039686695634354965,141789.21603781544,866.2933837817693,10.486842105263158,0.16568,30350.410282479395,23.161008212301468,6519.687377737515,10.838853332215592,0.4857224588358521,0.5416016621197219,0.0,0.046844798741680516,0.02028141560563884,114204.24024987879,769.6999972028425,9.25,0,1,1,36,1550.1,29.9,32.1,0.7891812865497077,0.13573574096399188
GBM (tuned + ensemble),2957.8209862183407,11.98416593290909,759.1477152793035,2.5417865978850473,0.5684318345932317,0.5892044107185356,0.0,0.09787395108553403,0.045015783593256066,8568.794143617388,184.93302208876983,11.157894736842104,0.16698000000000002,1552.8742877377404,3.4757837878333198,382.05361557599804,1.4876036641335277,0.6014217221200482,0.6267567763245895,0.0,0.050681941850522993,0.0219954442643497,7514.342598912193,103.35126359750211,10.0,1,1,2,34,1534.5,27.6,34.7,0.7742690058479532,0.1559607123284977
TABICL_GPU (default),107.96023476381747,19.280086713506464,9.625347263659664,2.230965763361112,0.47897637599149756,0.5379966712981932,0.05263157894736842,0.0794222885606876,0.035692508533049935,168.76491644336255,230.09998232516136,11.473684210526315,0.17207,25.732762111557854,3.449363695250617,8.684246340890724,1.7433667301105085,0.48735626621174133,0.5497333049508075,0.0,0.03858333399786745,0.016731980584033038,137.81489160855577,127.65482709424812,10.0,6,4,1,27,1523.8,29.7,28.5,0.7672514619883041,0.2845289883186061
TABM_GPU (tuned),34341.376551290836,1.0683778852050068,4223.58680397795,0.37888409513825755,0.5709783148149593,0.5765178581391853,0.0,0.0927595492056652,0.05299394269910363,45864.929770826835,13.890136033418576,13.276315789473685,0.172585,8245.070318195554,0.2723346948623657,2466.2108648716858,0.17557376557934842,0.5507957866218385,0.6288560507643206,0.0,0.04800849593484757,0.02607612333405842,39661.48674821285,10.546435553124088,14.0,1,3,2,32,1479.0,30.3,24.4,0.7271929824561404,0.1570697532042563
CAT (tuned + ensemble),16504.07579820414,2.581779239609925,3201.0265707590415,0.8915752484646539,0.5946662580096064,0.5968450611720829,0.0,0.09115569260254941,0.0477966370748438,27193.868813721343,44.08651163500922,13.578947368421053,0.16040500000000002,5046.900271190538,1.2630292971928916,1372.9411122807264,0.5562989902961428,0.5944863235145031,0.6352896281726772,0.0,0.05787378892595674,0.019704164873507762,20541.46486167592,37.76547447057217,13.0,0,1,3,34,1475.0,32.4,27.8,0.7204678362573099,0.11511468833422164
CAT (tuned),16504.07579820414,0.42638187025025576,3201.0265707590415,0.12024042770987867,0.6116282446594372,0.6091847626588384,0.0,0.09330164070819684,0.047439007457534044,27193.868813721343,6.229177098348047,13.907894736842104,0.161705,5046.900271190538,0.12877851062350804,1372.9411122807264,0.07385371803542701,0.6204742509154653,0.6556676704548872,0.0,0.05645008601114826,0.021526104955176803,20541.46486167592,4.9725379405984995,14.0,1,2,1,34,1469.2,28.2,37.9,0.7131578947368421,0.14293749785810195
GBM (tuned),2957.8209862183407,1.875632255676894,759.1477152793035,0.5049369881097493,0.6518852687023466,0.6461480382772686,0.0,0.10551185264608158,0.05128297986373955,8568.794143617388,32.37091249654915,14.355263157894736,0.16877999999999999,1552.8742877377404,0.5045170254177518,382.05361557599804,0.2538713244201605,0.6824044717257683,0.6669319095180881,0.0,0.05207128139882822,0.03005679714162385,7514.342598912193,15.175591971121808,12.5,0,0,0,38,1456.7,30.4,26.1,0.7032163742690059,0.08276267433630555
CAT (default),220.2627424415789,0.2710267032099049,109.9732905896213,0.13662992734425208,0.626144592774637,0.6427083507202662,0.0,0.10281875743582664,0.04893944480490472,404.9759898295662,6.476751807794606,14.710526315789474,0.16373,18.264583627382912,0.17750852637820774,5.723546572951673,0.0761539571798428,0.639939228552679,0.6499349464637711,0.0,0.05481736123368691,0.019786450586402833,103.69107151573829,5.120402547941957,16.0,1,3,1,33,1451.1,24.5,27.0,0.6953216374269006,0.14022371465401448
XGB (tuned + ensemble),5957.374200563333,6.456417350253166,1167.526219278486,2.7907218103162132,0.6588561561960194,0.6574281691112647,0.0,0.1067022658124541,0.05567301001815801,10832.944520166355,137.58836217438702,14.789473684210526,0.165745,1680.0658507664998,2.2809726662105985,685.86510540535,1.4547593315263065,0.7108228467509188,0.7171111574751186,0.0,0.0603611320440825,0.02692183365767313,8251.297786111467,74.47019952683146,13.5,0,1,1,36,1448.1,32.9,24.9,0.6935672514619883,0.10756209076732069
TABPFNV2_GPU (tuned + ensemble),10002.759252123178,101.4355180454533,2653.4255131981217,44.749282859570734,0.49532385491866454,0.5509387322701887,0.3157894736842105,0.09695040281680861,0.058728542549192436,49565.00539849654,3444.4876051657507,14.881578947368421,0.17261500000000002,2077.023049354553,12.760541562239329,3008.2157047151595,20.848616639963154,0.5072452050359268,0.5677747034508405,0.0,0.04061436251124223,0.031106953519191773,28624.579287895787,825.5109643363919,8.5,8,3,2,25,1445.0,30.6,29.9,0.6915204678362573,0.32181779615879497
MNCA_GPU (tuned),57356.284085895306,20.170870465214488,5990.817791505437,2.0485901061394993,0.7025823626148527,0.6507675557385617,0.0,0.1050159299928894,0.05874787415153362,80456.91211763977,129.4389431731015,16.57894736842105,0.17054999999999998,14186.536935488384,0.6020842525694106,4879.890404506269,0.5247194359730172,0.7690491729735598,0.6702836878360258,0.0,0.06755655006003625,0.027105979169962026,66956.02864547497,27.735396922747526,17.0,1,0,0,37,1409.3,28.1,23.5,0.6538011695906433,0.10281514647424263
XGB (tuned),5957.374200563333,1.301870340352867,1167.526219278486,0.6678561539347552,0.7105198511088757,0.6960442598898836,0.0,0.11080653156459744,0.05884359649073092,10832.944520166355,28.47499330573137,16.657894736842106,0.16848000000000002,1680.0658507664998,0.3827125522825453,685.86510540535,0.2050912539994952,0.7463879413831154,0.7462851118244674,0.0,0.0694652487234973,0.03410738014755424,8251.297786111467,11.461364611937853,15.5,0,0,0,38,1410.2,25.6,26.5,0.652046783625731,0.07507384066676424
MNCA_GPU (tuned + ensemble),57356.284085895306,531.7593351699455,5990.817791505437,50.07821547982156,0.6099285077752006,0.6024805894619766,0.0,0.10389701155615068,0.06960039788919663,80456.91211763977,3301.6213661597712,16.86842105263158,0.183035,14186.536935488384,14.17156207561493,4879.890404506269,8.743516387788919,0.5698975230648103,0.5718914178867698,0.0,0.06497462948315735,0.02784039455201512,66956.02864547497,548.1975258046007,12.5,0,2,4,32,1405.4,31.3,27.5,0.6473684210526316,0.1328589976048252
TABPFNV2_GPU (tuned),10002.759252123178,3.410806728176206,2653.4255131981217,1.555769686886014,0.6097531463147869,0.6324463802237591,0.3157894736842105,0.12130685228554555,0.07210310414607753,49565.00539849654,114.00749994252256,18.026315789473685,0.1868,2077.023049354553,0.5060818235079447,3008.2157047151595,0.5144277113236544,0.6490093634271488,0.644890666089377,0.0,0.08561151707807774,0.03658630335592475,28624.579287895787,25.570859321617846,13.0,1,8,1,28,1380.6,30.3,28.3,0.6216374269005848,0.19371070574686028
TABM_GPU (default),150.0762373017289,1.2507152029645372,19.886819500646006,0.46487430096802723,0.7127219621485775,0.72313189083255,0.0,0.1257035538283091,0.06837941309074774,189.61145131944258,14.042337078320946,18.38157894736842,0.17246,31.126562476158142,0.20260944763819377,10.213381764059356,0.1381032773929915,0.8228194333175417,0.7848373029272246,0.0,0.06154427249600947,0.026799112086523788,144.96049255349743,10.968725907290855,18.0,0,0,1,37,1371.1,28.6,23.2,0.6137426900584795,0.08312970971092216
TABPFNV2_GPU (default),11.400364582092442,0.8958970822786029,4.227300497658887,0.4575723059305235,0.6406097179261335,0.6862800129002168,0.3157894736842105,0.13032850434642715,0.08031345243486752,53.73712299744585,29.471362460513575,19.026315789473685,0.1886,7.994279013739691,0.2908047080039978,3.368600991426515,0.3152861168047789,0.7557735280974687,0.7136046626834447,0.0,0.07929605277196083,0.034405727147343024,41.34560285308973,17.75894630310718,17.0,4,1,4,29,1360.5,29.7,30.5,0.5994152046783626,0.21038522146207148
NN_TORCH (tuned + ensemble),24331.947126566527,16.31368946276213,3050.9102763481856,4.325404104362444,0.7594106495557492,0.7414539895842889,0.0,0.11637811814114107,0.06443136714131494,56038.718047349466,227.24142067178357,19.105263157894736,0.17071999999999998,9097.789536105262,5.056680162747702,2389.2199648500327,2.157502904371376,0.8577250848901621,0.7855656814478722,0.0,0.06782479524801643,0.039736864509205674,44480.90343297912,173.17543399472862,19.5,0,0,0,38,1359.1,30.7,25.6,0.5976608187134503,0.06728133787965217
EBM (tuned + ensemble),36729.440274559965,1.3371900389766136,6141.104384884199,0.5036823041953499,0.8041483690337505,0.7994399587377231,0.0,0.14906209202998957,0.07162048702974792,25271.27145534202,19.518642702178116,19.36842105263158,0.17122500000000002,2366.879786974854,0.4240463972091675,914.2329798556116,0.21634762578811195,0.8959558371740426,0.8319012062616311,0.0,0.0823298574360965,0.03754211168741026,15273.913117491418,11.254406005139426,19.0,0,0,1,37,1352.9,28.0,33.9,0.591812865497076,0.07345912374192816
REALMLP (tuned),88520.97457650169,2.4448067613512454,9129.519936677754,0.9659323148202512,0.8022194068000837,0.7438555487100591,0.0,0.11967014747000536,0.067805767458361,141789.21603781544,39.671058690327705,19.407894736842106,0.17194500000000001,30350.410282479395,0.9911958641476102,6519.687377737515,0.5341468926001405,0.8900598459312506,0.7849080766969769,0.0,0.076222073390521,0.036261831952858196,114204.24024987879,35.28258688191838,18.75,0,0,0,38,1354.3,29.5,30.2,0.5909356725146199,0.059638464242996805
FASTAI (tuned + ensemble),7309.51755415473,18.692008190266574,1376.1098802486467,8.473426897342513,0.7920437039473232,0.7869968890223731,0.0,0.14576367767903883,0.07637330144241317,18964.092381812123,455.5897128961598,21.11842105263158,0.178125,3087.37076303694,11.787012616793314,618.8953909329178,4.7655686359255345,0.9987249744320944,0.8513601622682089,0.0,0.08731644523471122,0.04769675893103309,15284.817189242676,443.8905026950689,22.75,0,1,0,37,1320.3,27.7,27.6,0.5529239766081872,0.08022178448491198
MNCA_GPU (default),304.22019695963775,10.484658345144394,17.60828380729721,1.3061868722894623,0.8540046074725524,0.8063738522446967,0.0,0.14684370064110588,0.0696065900545583,254.62493914649235,74.63976386946672,21.842105263157894,0.18519,31.500762327512106,0.5732622504234314,14.777266169000828,0.34634581634079226,1.0,0.8841007634102469,0.0,0.09230605494417649,0.04923042026786215,209.09978531409226,23.91921150117286,23.0,1,0,0,37,1305.9,31.2,26.3,0.5368421052631579,0.08088348502258735
EBM (tuned),36729.440274559965,0.18373215324000308,6141.104384884199,0.08212434505580339,0.8642510492002092,0.8383248228051443,0.0,0.15617886229027844,0.07855897408895007,25271.27145534202,2.5019643980329747,22.44736842105263,0.17203000000000002,2366.879786974854,0.0449512971772088,914.2329798556116,0.02528246646038782,1.0,0.8874585195101663,0.0,0.08835084225135731,0.04401886662631012,15273.913117491418,1.252412448907763,23.5,0,0,0,38,1296.1,27.9,28.6,0.5233918128654971,0.05552837214475196
EBM (default),119.68166406294058,0.19009479611937763,11.429209024387047,0.10185762188607354,0.8521577866418304,0.8496939876458984,0.0,0.16604276757956396,0.08443380317586864,109.6134925105603,3.4110931424674527,23.94736842105263,0.17447000000000001,9.92618230978648,0.06371633741590713,4.31382445805991,0.0475851422516083,1.0,0.9182268240099343,0.0,0.09554880640873625,0.03735917519187387,60.70178540099199,2.7834768100426253,24.0,1,0,1,36,1264.1,26.6,28.6,0.49005847953216375,0.08521894756170818
REALMLP (default),302.86446626785903,3.0018129969200893,26.277581916467653,2.856467141391241,0.9047265182202809,0.8513555652836443,0.0,0.14572889637854225,0.07876661921730284,472.82883048215837,115.85189921010567,24.57894736842105,0.17367,103.08245442973242,2.582352219687568,21.83278809042843,0.8957992336206358,1.0,0.8816713305573889,0.0,0.11770994919600164,0.044848393049036456,372.59928806765174,52.85349774079455,25.5,0,0,0,38,1251.0,30.2,28.4,0.4760233918128655,0.047381181782960774
XGB (default),13.116732126927516,0.5742131232518202,3.202512268619222,0.2981186068489448,0.874466152535315,0.8435252660819125,0.0,0.14066462097580287,0.09048418035885683,31.46292861336276,13.4756369742995,24.605263157894736,0.17317,5.653352538744608,0.30113152662913,1.771208861779989,0.11707781619763814,1.0,0.9277381451047934,0.0,0.09869044613168698,0.05270323480996493,28.26053761224749,9.127662964815336,24.0,0,0,0,38,1247.9,34.4,24.3,0.47543859649122805,0.051833815615755875
XT (tuned + ensemble),1317.4209560674533,2.9519454171085915,472.65138083581655,1.3657584923642982,0.8840546994033469,0.8510461005303817,0.0,0.1579271337121436,0.09320121945992142,4571.615940832833,75.60244638381447,24.789473684210527,0.17667500000000003,756.8230986197789,1.8136235740449693,189.76252609436062,0.7431041876698922,1.0,0.9331806007391648,0.0,0.09299076955407681,0.06584380591466965,2805.66154207989,66.39288968996527,28.0,0,0,1,37,1246.0,29.2,27.1,0.47134502923976607,0.05924808370360315
NN_TORCH (tuned),24331.947126566527,0.8716282456241854,3050.9102763481856,0.24396545036982029,0.8860168084971815,0.8345993980024481,0.0,0.14120278296028227,0.08347605455235468,56038.718047349466,12.138164397560297,25.25,0.17446499999999998,9097.789536105262,0.29952494303385413,2389.2199648500327,0.15177921475257505,1.0,0.8984777883083634,0.0,0.10379368194908523,0.05368328285707363,44480.90343297912,9.196372470124006,25.5,0,0,0,38,1235.6,30.1,24.1,0.46111111111111114,0.045887389359228405
TABDPT_GPU (default),171.71139350780967,66.09824930987163,27.724576482795502,22.626481529214185,0.8177317713801062,0.8063890532379178,0.0,0.1548400388440821,0.08802714803594046,481.10896045076544,1338.9171702872993,25.289473684210527,0.190385,97.80311637454562,28.07416233751509,22.609050986069803,8.552450841932743,1.0,0.9488825694234415,0.0,0.1092338236572048,0.04534492007639533,400.67828468381333,1123.8959746745188,30.5,2,0,3,33,1235.5,33.0,33.7,0.460233918128655,0.12314777803964816
FASTAI (tuned),7309.51755415473,1.0496307073977955,1376.1098802486467,0.623937267000547,0.9029754331682492,0.8559979865110569,0.0,0.16578900249006634,0.09223804707606086,18964.092381812123,32.00078545496663,26.30263157894737,0.18023499999999998,3087.37076303694,0.8054822285970051,618.8953909329178,0.2978802219128553,1.0,0.8990402955585416,0.0,0.09789639604232409,0.058573096763547536,15284.817189242676,26.536834640421894,26.0,0,0,0,38,1213.8,36.3,22.6,0.437719298245614,0.04802009229852553
RF (tuned + ensemble),2309.3465478268977,2.3587986137434753,541.3031953907538,1.2662572218585502,0.8931200386846528,0.8625823498028392,0.0,0.16581458532483917,0.10465934401787833,5371.163113535875,67.79453318661523,26.55263157894737,0.177925,871.1966819789675,1.9029027620951335,323.74369638605225,0.7428875097152683,1.0,0.9724467108699417,0.0,0.10359622790389916,0.07348881034574246,4278.677975908691,61.67862848692378,29.0,0,1,1,36,1210.4,25.1,26.0,0.43216374269005847,0.06895819972743905
GBM (default),7.9087888545460165,0.5753568479889317,2.951996847158713,0.17011749256975625,0.90457386796403,0.8781538730968432,0.0,0.15377361970482045,0.09359955190177657,31.85523047906877,10.81395814590769,27.157894736842106,0.172975,5.532836645179325,0.2585195038053725,1.7913477923414471,0.12049981156984965,1.0,0.9436752182752575,0.0,0.11068698010353623,0.06150636429702874,25.045220341015206,6.549914554959342,28.0,0,0,0,38,1196.4,27.5,27.4,0.41871345029239765,0.04230027818413819
XT (tuned),1317.4209560674533,0.3043035832762021,472.65138083581655,0.1722314796858175,0.9183451297016355,0.8839642996468039,0.0,0.1716073337665082,0.10108382485694561,4571.615940832833,8.349379059762423,27.407894736842106,0.17796,756.8230986197789,0.18769407272338867,189.76252609436062,0.07878183958882805,1.0,0.9608744878797215,0.0,0.10587649481110034,0.06787430737095648,2805.66154207989,8.013491457037578,30.5,0,1,0,37,1190.4,28.3,25.0,0.4131578947368421,0.05799400992891437
RF (tuned),2309.3465478268977,0.23789871352457861,541.3031953907538,0.15647241398955916,0.9194076837644993,0.8885351029463756,0.0,0.17793267481613345,0.11270911956762278,5371.163113535875,7.241889916329925,29.026315789473685,0.178915,871.1966819789675,0.17230602105458576,323.74369638605225,0.07643497412773152,1.0,0.9937142594287514,0.0,0.11788865245982966,0.0730160635989342,4278.677975908691,6.216263662191708,31.5,0,1,1,36,1156.1,28.7,35.1,0.37719298245614036,0.06043592848918936
NN_TORCH (default),48.66164081361559,0.6504810580733227,11.536659167937149,0.237589986723434,0.9756576872253887,0.9487943416771858,0.0,0.19345935921063773,0.1142020447354156,155.18773864824473,10.878820110101595,32.328947368421055,0.180355,26.916632894674937,0.2675716214709811,6.83469910157457,0.14703020953097523,1.0,0.9958604575473864,0.0,0.142508632176782,0.08254965187621874,137.05069981706868,8.619821917518276,33.0,0,0,0,38,1079.5,26.4,28.5,0.3038011695906433,0.03387950609878851
FASTAI (default),31.12571889106293,1.103970385504048,5.0919225272079345,0.5139922583887925,0.9666381094162454,0.938616457826646,0.0,0.2205946249171078,0.13782917464151753,74.47424680202033,27.97257070113272,33.44736842105263,0.19183499999999998,12.713354892200893,0.7982388072543674,2.9120182447539116,0.36810695156439827,1.0,1.0,0.0,0.16584224786083723,0.10101015982783891,60.1261932941261,24.301698162325565,36.5,0,0,0,38,1049.7,28.7,25.2,0.2789473684210526,0.033222104998880765
RF (default),3.928520040972191,0.1435067986187182,0.8021093004373405,0.07125612411976835,0.9848936390157254,0.9658488647846383,0.0,0.24146620137116087,0.18078523386541995,6.102631035159317,3.7416299644570024,35.25,0.21025,1.1966572999954224,0.08589340580834282,0.3813090053938437,0.03721195658349352,1.0,1.0,0.0,0.17586669886220047,0.1121938480169841,5.480722222590385,3.4579154094346745,37.0,0,0,0,38,1000.0,0.0,0.0,0.2388888888888889,0.030130929376235543
LR (tuned + ensemble),310.9206490230839,1.856038474618343,112.43523004573552,0.6244822981818133,0.9581697243607366,0.9509528476164266,0.0,0.2917619246531289,0.20783842153693577,1090.6543235756506,24.154836057241496,35.44736842105263,0.20569500000000002,172.99803659651013,0.33067578077316284,51.78500762114817,0.22385943240534312,1.0,1.0,0.0,0.23092004636837282,0.129932605346946,696.2608974821628,13.25530093456624,39.0,0,0,1,37,992.4,33.3,33.1,0.23450292397660819,0.03887833565488216
LR (tuned),310.9206490230839,0.5155129476597434,112.43523004573552,0.17331679222553462,0.9721474042486321,0.958590870444,0.0,0.30105734113761295,0.21630790999421837,1090.6543235756506,6.903044838007482,36.55263157894737,0.20751,172.99803659651013,0.12874411212073433,51.78500762114817,0.07805190196778514,1.0,1.0,0.0,0.23702586233900474,0.13996577713587863,696.2608974821628,4.422167155622043,39.5,0,0,0,38,960.0,34.5,29.7,0.20994152046783626,0.032398288172814246
LR (default),7.59960079534709,0.5285763780973111,2.68894957171299,0.18897855365485863,0.9806670457343031,0.9645748863537118,0.0,0.3109312407898901,0.24034001046121586,27.747857838408066,7.876763349331487,36.8421052631579,0.21230500000000002,5.359276652336121,0.13779839674631755,1.6116061178401027,0.09774404154712064,1.0,1.0,0.0,0.23702577885376547,0.14283438851684813,18.023625361427616,4.735441569338942,40.75,0,0,1,37,951.0,30.0,32.5,0.20350877192982456,0.03569100635736501
BOOSTEDDPDT (default),345.64987801759565,533.8809380267099,43.49129713627642,135.1656721577922,0.9706201446222716,0.9676423134914308,0.0,0.3594792450155854,0.34024045771085704,386.5043650278363,5301.7746891960005,37.026315789473685,0.228515,71.69894587993622,114.74682602617476,20.790165882010257,32.73773772315814,1.0,1.0,0.0,0.21281851381792039,0.1813052133660526,414.6958748581876,3884.5257926234735,39.5,0,0,1,37,942.1,38.9,26.6,0.19941520467836257,0.03595474842989151
XT (default),2.7359117016457675,0.18087529669031066,0.7550887156747417,0.07422230753863952,0.9892893285330476,0.9740044215299907,0.0,0.26658722117284267,0.20312299189057953,5.173741811459425,4.243951698357253,37.73684210526316,0.21292,1.01790091726515,0.09051434199015299,0.24605929188859293,0.04072451222400395,1.0,1.0,0.0,0.18424442460299273,0.11519464563812844,4.456264068369281,3.7424721126192138,39.5,0,0,0,38,917.5,33.0,26.9,0.18362573099415205,0.028861325321120828
KNN (tuned + ensemble),167.0455302492917,11.66904854404996,12.260943976874097,0.77695539036087,1.0,0.9961587190478705,0.13157894736842105,0.48462823969745605,0.5425852032822736,72.48328913141673,77.9475913060204,42.4078947368421,0.318405,9.821025305324131,0.2367298404375712,2.969713103492417,0.18997109296167614,1.0,1.0,0.0,0.42625485653324524,0.5620921989172869,56.9019152818522,12.487074071232104,44.0,0,0,0,38,697.3,33.4,38.5,0.07982456140350877,0.02377674045619972
KNN (tuned),167.0455302492917,1.8054817143936601,12.260943976874097,0.13106223823409818,1.0,0.9976848124577798,0.13157894736842105,0.5033620273790689,0.585400415354597,72.48328913141673,12.511864612142434,43.4078947368421,0.322975,9.821025305324131,0.0851174063152737,2.969713103492417,0.040417757021976156,1.0,1.0,0.0,0.4549174904387682,0.6016211835068992,56.9019152818522,2.342964973116946,44.0,0,0,0,38,624.7,41.6,43.6,0.05760233918128655,0.023127845677905285
KNN (default),1.7449495283483762,0.22627578220869365,0.489346568018936,0.038714559202156204,1.0,1.0,0.13157894736842105,0.5871637410556744,0.8803655811672976,1.0055419249185589,2.3649007838074403,44.76315789473684,0.382765,0.27595198154449463,0.036337282922532826,0.07126887487893994,0.021006283652748647,1.0,1.0,0.0,0.5463314318406584,1.0,1.0,1.2165713596834893,46.0,0,0,0,38,491.5,51.2,72.0,0.027485380116959064,0.022455899036001325

Running a smaller HPO experiment on TabArena-Lite next.

@KohlerHECTOR
Copy link
Author

Hey @LennartPurucker ! Thx so much for the hard work :) I am happy it was not too difficult to use my code and the DPDT is not that bad :p .

@LennartPurucker
Copy link
Collaborator

Here are the results on TabArena-Lite with 200 HPO configs:

image

With HPO, we get similar performance to RF and ExtraTees.


Some things I noticed during HPO:

  • For some datasets, the inference time is really large (up to several hours). I am not sure why exactly, but it might be good to investigate this.
  • The memory estimate was not quite correct, so I adjusted it. I think it is still not perfect and would likely require early stopping in case of memory out within the boosting.

Feel free to run the TabArena benchmark yourself as well to test more and further improve the model.

@KohlerHECTOR
Copy link
Author

Hey @LennartPurucker thanks for the results ! Assuming that I am in the tabrepo/ folder, what command should I use to run the full benchmark please ?

Thank you so much for the results ! It really helps improve the model :) . I will furtherwork on the inference time: on quick change is to reduce the number of boosted estimators from 1000 to e.g. 500 or simply set the latter as a hyperparamter.

Memory consumption is indeed hard to estimate sorry.

@LennartPurucker
Copy link
Collaborator

Check out https://github.com/TabArena/tabarena_benchmarking_examples for examples and more info on how to run the benchmark!

Not sure I would make the number of estimators a hyperparameter, but one could also early stop based on the estimated inference time, or prune the model after fitting. I am also not sure why it is so slow, as it happens for datasets with a lot of categoricals, I think it might be related to that aspect.

No worries, memory estimation is always something that one has to figure out by trial and error :D

@KohlerHECTOR
Copy link
Author

@LennartPurucker I have had a look a the examples but they are not very clear to me. Is there a "main loop" that iterates over the training datasets somewhere?

Btw, I have made the inference and training way (way way) faster. It was in cubic time now it is in linear time. So do you think there is a command for me to run the benchmark somewhere ?

Thank you again for all the help.

@LennartPurucker
Copy link
Collaborator

Check out this example: https://github.com/TabArena/tabarena_benchmarking_examples/blob/main/tabarena_minimal_example/run_tabarena_lite.py

The run_experiments_new function does the main loop for you. In detail, here it loops over all tasks (task_ids) and selected repetitions (TabArena-Lite) as specified in the input arguments; see the doc string for more documentation. In our setup, we parallelize across tasks, repetitions, and model configurations to make it faster.

In your case, you only need to change the get_configs function to use your own configs function from the new generate.py for DPDTs: model_experiments = gen_boosteddpdt.generate_all_bag_experiments(num_random_configs=num_random_configs)

Let me know if this helps! If not, I can take a look later and send you an example script.

@KohlerHECTOR
Copy link
Author

KohlerHECTOR commented Aug 6, 2025

Check out this example: https://github.com/TabArena/tabarena_benchmarking_examples/blob/main/tabarena_minimal_example/run_tabarena_lite.py

The run_experiments_new function does the main loop for you. In detail, here it loops over all tasks (task_ids) and selected repetitions (TabArena-Lite) as specified in the input arguments; see the doc string for more documentation. In our setup, we parallelize across tasks, repetitions, and model configurations to make it faster.

In your case, you only need to change the get_configs function to use your own configs function from the new generate.py for DPDTs: model_experiments = gen_boosteddpdt.generate_all_bag_experiments(num_random_configs=num_random_configs)

Let me know if this helps! If not, I can take a look later and send you an example script.

That is super helpful thx so much I will have a look ! But so I can generate 1 million random_configs ? Also, if I want run on TabArena not lite should I change the arguments of repetitions_mode ? How to make the comparison fair with other baselines ? Is this the correct TabArena version task_ids = openml.study.get_suite("tabarena-v0.1").tasks ? thx again

@LennartPurucker
Copy link
Collaborator

But so I can generate 1 million random_configs ?

Generally, you can follow the settings used for the other methods as described in our paper: 200 configs, 32 GB RAM, 8 CPUs

But you are free to invest as much time or resources into the method as you like for your own studies. For the leaderboard, we would look at your result artifacts and only pick 200 random configurations to make it comparable.

I want run on TabArena not lite should I change the arguments of repetitions_mode ?

Jup, you need to set the parameter to run the correct number of folds and repeats for the dataset size. See this metadata file for information per dataset https://github.com/TabArena/tabarena_dataset_curation/blob/main/dataset_creation_scripts/metadata/tabarena_dataset_metadata.csv

Is this the correct TabArena version task_ids = openml.study.get_suite("tabarena-v0.1").tasks

Yes!

@KohlerHECTOR
Copy link
Author

KohlerHECTOR commented Aug 11, 2025

But so I can generate 1 million random_configs ?

Generally, you can follow the settings used for the other methods as described in our paper: 200 configs, 32 GB RAM, 8 CPUs

But you are free to invest as much time or resources into the method as you like for your own studies. For the leaderboard, we would look at your result artifacts and only pick 200 random configurations to make it comparable.

I want run on TabArena not lite should I change the arguments of repetitions_mode ?

Jup, you need to set the parameter to run the correct number of folds and repeats for the dataset size. See this metadata file for information per dataset https://github.com/TabArena/tabarena_dataset_curation/blob/main/dataset_creation_scripts/metadata/tabarena_dataset_metadata.csv

Is this the correct TabArena version task_ids = openml.study.get_suite("tabarena-v0.1").tasks

Yes!

Thank you so much !
Ok so I think I the best option for me would be to use this code: https://github.com/TabArena/tabarena_benchmarking_examples/blob/main/tabarena_minimal_example/get_tabarena_data.py and change this line https://github.com/TabArena/tabarena_benchmarking_examples/blob/30d7f31219b5a6c26f3603a1592b7d7c87ae4cde/tabarena_minimal_example/get_tabarena_data.py#L61 to use my .fit()
Is there a specific way I need to save my results :) ?

Thanks in advance.

@LennartPurucker
Copy link
Collaborator

To be compatible with TabArena and use the model pipeline framework we designate, you will have to use our benchmarking interface via https://github.com/TabArena/tabarena_benchmarking_examples/blob/main/tabarena_minimal_example/run_tabarena_lite.py

Our benchmarking interface runs the model in a well-designed pipeline that handles many problems you might encounter. Moreover, it saves all the results as needed. There is no straightforward or supported method for achieving the same without our benchmarking interface.


To test your method on your own, you can use the code above and run your method against it. However, this approach lacks sufficient support for proper cross-validation, HPO, and other features provided by the benchmarking interface. Thus, you would need to implement benchmarking code, which may encounter bugs and other issues that lead to unfair comparisons.


I strongly recommend using our benchmarking interface. Is there something specific that stops you from using the benchmarking interface?

@KohlerHECTOR
Copy link
Author

To be compatible with TabArena and use the model pipeline framework we designate, you will have to use our benchmarking interface via https://github.com/TabArena/tabarena_benchmarking_examples/blob/main/tabarena_minimal_example/run_tabarena_lite.py

Our benchmarking interface runs the model in a well-designed pipeline that handles many problems you might encounter. Moreover, it saves all the results as needed. There is no straightforward or supported method for achieving the same without our benchmarking interface.

To test your method on your own, you can use the code above and run your method against it. However, this approach lacks sufficient support for proper cross-validation, HPO, and other features provided by the benchmarking interface. Thus, you would need to implement benchmarking code, which may encounter bugs and other issues that lead to unfair comparisons.

I strongly recommend using our benchmarking interface. Is there something specific that stops you from using the benchmarking interface?

Well I could naively run https://github.com/TabArena/tabarena_benchmarking_examples/blob/main/tabarena_minimal_example/run_tabarena_lite.py . But that would mean two problems:

  1. there is no correct repetition_mode argument for the run_experiments_new() method: there is only tabarena-lite matric or single, which none does the same as the tabarena repetitions from https://github.com/TabArena/tabarena_benchmarking_examples/blob/main/tabarena_minimal_example/get_tabarena_data.py unless I am mistaken.
  2. I have a whole server of CPU and I dont know if simply running https://github.com/TabArena/tabarena_benchmarking_examples/blob/main/tabarena_minimal_example/run_tabarena_lite.py would make the most of it without having access to the hidden for loops for me to wrap with joblib.

@LennartPurucker
Copy link
Collaborator

  1. You need to set repetitions_mode and repetitions_mode_args according to your needs. See the docstring for all options. To do so, you need to determine how many repeats and folds to run per task, by looking at the task metadata or by using OpenML.

An example code to run all experiments sequentially would be:

import pandas as pd
metadata_df = pd.read_csv("https://raw.githubusercontent.com/TabArena/tabarena_dataset_curation/refs/heads/main/dataset_creation_scripts/metadata/tabarena_dataset_metadata.csv")
task_ids = metadata_df["task_id"].tolist()
folds = metadata_df["num_folds"].tolist()
repeats = metadata_df["tabarena_num_repeats"].tolist()


run_experiments_new(
    output_dir=TABARENA_DIR,
    model_experiments=model_experiments,
    tasks=task_ids,
    repetitions_mode="matrix",
    repetitions_mode_args=[
          (n_fold, n_repeats) for n_fold, n_repeats in zip(folds, repeats)
    ]
)
  1. In general, I recommend starting several jobs, each running, for example, one config on one fold on one dataset via the benchmarking interface. Thus, you want to parallelize over the tasks, folds, and repeats in the code above by calling the run_experiments_new multiple times in different jobs/processes. In other words, your for loop should be over the experiments, and not within the model.

See TabFlow for examples https://github.com/TabArena/tabarena_benchmarking_examples/tree/main/tabflow_slurm

Thus, you would want to split up the code above to look like this:

import pandas as pd
from itertools import product

metadata_df = pd.read_csv(
    "https://raw.githubusercontent.com/TabArena/tabarena_dataset_curation/refs/heads/main/dataset_creation_scripts/metadata/tabarena_dataset_metadata.csv"
)

for row in metadata_df.itertuples():
    repeats_folds = product(
        range(int(row.tabarena_num_repeats)), range(int(row.num_folds))
    )
    for repeat_i, fold_i in repeats_folds:
        for model_experiment in model_experiments:
			# You likely want to parallelize this call/part 
            run_experiments_new(
                output_dir=TABARENA_DIR,
                model_experiments=[model_experiment],
                tasks=[row.task_id],
                repetitions_mode="individual",
                repetitions_mode_args=[(fold_i, repeat_i)],
            )

To provide a general sketch of how to set up benchmarking:

  1. You want to have logic that can schedule jobs for all jobs you want to run. This scheduler should know which jobs to schedule (i.e., how many folds and repeats for which dataset, see the metadata file). -> https://github.com/TabArena/tabarena_benchmarking_examples/blob/main/tabflow_slurm/run_setup_slurm_jobs.py
  2. Each job scheduled by the scheduler should, for example, run one config on one fold on one dataset. -> https://github.com/TabArena/tabarena_benchmarking_examples/blob/main/tabflow_slurm/run_tabarena_experiment.py
  3. Once you run everything, you can compare the results to the LB with this: https://github.com/TabArena/tabarena_benchmarking_examples/blob/main/tabarena_minimal_example/run_evaluate_model.py

Our benchmarking code handles running the job and parallelizing the model training process.
Moreover, it saves (and caches) the data.

@KohlerHECTOR
Copy link
Author

  1. You need to set repetitions_mode and repetitions_mode_args according to your needs. See the docstring for all options. To do so, you need to determine how many repeats and folds to run per task, by looking at the task metadata or by using OpenML.

An example code to run all experiments sequentially would be:

import pandas as pd
metadata_df = pd.read_csv("https://raw.githubusercontent.com/TabArena/tabarena_dataset_curation/refs/heads/main/dataset_creation_scripts/metadata/tabarena_dataset_metadata.csv")
task_ids = metadata_df["task_id"].tolist()
folds = metadata_df["num_folds"].tolist()
repeats = metadata_df["tabarena_num_repeats"].tolist()


run_experiments_new(
    output_dir=TABARENA_DIR,
    model_experiments=model_experiments,
    tasks=task_ids,
    repetitions_mode="matrix",
    repetitions_mode_args=[
          (n_fold, n_repeats) for n_fold, n_repeats in zip(folds, repeats)
    ]
)
  1. In general, I recommend starting several jobs, each running, for example, one config on one fold on one dataset via the benchmarking interface. Thus, you want to parallelize over the tasks, folds, and repeats in the code above by calling the run_experiments_new multiple times in different jobs/processes. In other words, your for loop should be over the experiments, and not within the model.

See TabFlow for examples https://github.com/TabArena/tabarena_benchmarking_examples/tree/main/tabflow_slurm

Thus, you would want to split up the code above to look like this:

import pandas as pd
from itertools import product

metadata_df = pd.read_csv(
    "https://raw.githubusercontent.com/TabArena/tabarena_dataset_curation/refs/heads/main/dataset_creation_scripts/metadata/tabarena_dataset_metadata.csv"
)

for row in metadata_df.itertuples():
    repeats_folds = product(
        range(int(row.tabarena_num_repeats)), range(int(row.num_folds))
    )
    for repeat_i, fold_i in repeats_folds:
        for model_experiment in model_experiments:
			# You likely want to parallelize this call/part 
            run_experiments_new(
                output_dir=TABARENA_DIR,
                model_experiments=[model_experiment],
                tasks=[row.task_id],
                repetitions_mode="individual",
                repetitions_mode_args=[(fold_i, repeat_i)],
            )

To provide a general sketch of how to set up benchmarking:

  1. You want to have logic that can schedule jobs for all jobs you want to run. This scheduler should know which jobs to schedule (i.e., how many folds and repeats for which dataset, see the metadata file). -> https://github.com/TabArena/tabarena_benchmarking_examples/blob/main/tabflow_slurm/run_setup_slurm_jobs.py
  2. Each job scheduled by the scheduler should, for example, run one config on one fold on one dataset. -> https://github.com/TabArena/tabarena_benchmarking_examples/blob/main/tabflow_slurm/run_tabarena_experiment.py
  3. Once you run everything, you can compare the results to the LB with this: https://github.com/TabArena/tabarena_benchmarking_examples/blob/main/tabarena_minimal_example/run_evaluate_model.py

Our benchmarking code handles running the job and parallelizing the model training process. Moreover, it saves (and caches) the data.

That is actually so helpfull !!!!!
So to summarize, I launch a set of run_experiments_new, one for each repet/fold/task and I can easily parallelize over the latter loop ?

@LennartPurucker
Copy link
Collaborator

Jup!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants