Skip to content

Conversation

@j201
Copy link
Contributor

@j201 j201 commented Sep 29, 2025

Hi, this is Alex from the TabDPT team. We've made a number of code improvements on TabDPT so we wanted to share them as well as adding the ability to do hyperparameter search.

I've tested out generate_all_configs.py and run_tabarena_lite.py on my end using TabDPT, but I wasn't able to figure out running the full range of TabArena folds/repeats, so please let me know if you run into any issues (or if there's an easy way for me to try that out).

Additional changes:

  • Load TabDPT from PyPI instead of Github
  • Remove workaround for batch size 1 (shouldn't be needed anymore)
  • Use new default download method (huggingface-hub) instead of custom caching code

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Alex Labach added 5 commits September 18, 2025 15:24
Additional changes:
- Load TabDPT from PyPI instead of Github
- Remove workaround for batch size 1
- Use new default download method (huggingface-hub) instead of custom
  caching code
@LennartPurucker
Copy link
Collaborator

Heyho @j201, great to see and thank you very much for the integration!

I will take a closer look at the PR with feedback as soon as I can.

Regarding running TabArena-full on your end, see the following: #176 (comment)
I am hope to be able to add easier support for this soon (I added it to the roadmap #213).

If you have results for TabArena-Lite you want to share, I would be happy to see them!

@LennartPurucker
Copy link
Collaborator

Also, fyi: a verified version of TabDPT was on our roadmap for the next leaderboard version (deadline ca. 1-2 months).
So I am happy to run TabDPT + HPO on TabArena-Full on our end as well :)

@j201
Copy link
Contributor Author

j201 commented Oct 6, 2025

We're currently looking at adding some more hyperparameters, hoping to get those in this week.

If you have results for TabArena-Lite you want to share, I would be happy to see them!

So I've generated results for fold 0, and we're getting a lower metric_error than the saved TabDPT results on 44/51 datasets. Here are the result files: results.tar.gz, but when I run run_evaluate_model.py with them, it shows a worse ranking than before, which I'm a bit confused about - maybe just because of the missing folds?

@Innixma
Copy link
Collaborator

Innixma commented Oct 6, 2025

We're currently looking at adding some more hyperparameters, hoping to get those in this week.

If you have results for TabArena-Lite you want to share, I would be happy to see them!

So I've generated results for fold 0, and we're getting a lower metric_error than the saved TabDPT results on 44/51 datasets. Here are the result files: results.tar.gz, but when I run run_evaluate_model.py with them, it shows a worse ranking than before, which I'm a bit confused about - maybe just because of the missing folds?

@j201 For the evaluation, try: only_valid_tasks=True or subset="lite" in the compare_on_tabarena call.

@Innixma
Copy link
Collaborator

Innixma commented Oct 7, 2025

@j201 if you pull the latest mainline, you can run the following:

from __future__ import annotations

from pathlib import Path

from tabrepo.nips2025_utils.tabarena_context import TabArenaContext
from tabrepo.tabarena.website_format import format_leaderboard
from tabrepo.nips2025_utils.end_to_end_single import EndToEndResultsSingle


if __name__ == '__main__':
    save_path = "output_leaderboard"  # folder to save all figures and tables

    path_to_my_results = "tabdpt_new/hpo_results.parquet"  # replace with your local path

    from autogluon.common.loaders import load_pd
    hpo_results_tabdpt_new = load_pd.load(path=path_to_my_results)

    # differentiate new results compared to old results
    hpo_results_tabdpt_new = EndToEndResultsSingle.add_prefix_to_results(results=hpo_results_tabdpt_new, prefix="NEW_")

    output_path_verified = Path(save_path) / "verified"
    tabarena_context = TabArenaContext(include_unverified=False)
    leaderboard_verified = tabarena_context.compare(
        output_dir=output_path_verified,
        new_results=hpo_results_tabdpt_new,
        only_valid_tasks=True,  # only evaluate tasks present in new_results
    )
    leaderboard_website_verified = format_leaderboard(df_leaderboard=leaderboard_verified)

    print(f"Verified Leaderboard:")
    print(leaderboard_website_verified.to_markdown(index=False))
    print("")
TabDPT_NEW
# Model Elo [⬆️] Elo 95% CI Score [⬆️] Rank [⬇️] Harmonic Rank [⬇️] Improvability (%) [⬇️] Median Train Time (s/1K) [⬇️] Median Predict Time (s/1K) [⬇️] Imputed (%) [⬇️] Imputed Hardware
18 NEW_TABDPT (default) 1303 +24/-24 0.322 21.2 5.2 12.483 59.79 59.2 0 False CPU
28 TabDPT (default) 1217 +25/-20 0.246 26.21 10.46 15.302 20.56 8.617 0 False GPU

@j201
Copy link
Contributor Author

j201 commented Oct 10, 2025

Okay, we've added a number of new hyperparameters and updated the search space:

search_space = {
    'temperature': Categorical(0.8, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.9, 1.0, 1.25, 1.5),
    'context_size': Categorical(2048, 768, 256),
    'permute_classes': Categorical(True, False),
    'normalizer': Categorical("standard", None, "minmax", "robust", "power", "quantile-uniform", "quantile-normal", "log1p"),
    'missing_indicators': Categorical(False, True),
    'clip_sigma': Categorical(4, 2, 6, 8),
    'feature_reduction': Categorical("pca", "subsample"),
    'faiss_metric': Categorical("l2", "ip")
}

@LennartPurucker
Copy link
Collaborator

Looking good! Would you say this is now in a state for me to benchmark it, or are we waiting on something in the pipeline?

@j201
Copy link
Contributor Author

j201 commented Oct 14, 2025

We don't have any more changes planned, so feel free to benchmark, I can help out if there are more changes needed though

@LennartPurucker
Copy link
Collaborator

@j201 Is sklearn>1.7 a necessary minimal bound of the requirement for TabDPT?

It currently conflicts with TabPFN (on pip, mainline is also on 1.8) and TabICL.

add: new random seed logic or TabDPT
@anthonycaterini
Copy link

@j201 Is sklearn>1.7 a necessary minimal bound of the requirement for TabDPT?

It currently conflicts with TabPFN (on pip, mainline is also on 1.8) and TabICL.

Ahh we were planning to make this change and lower the dependence of sklearn. Probably need to release again to PyPI. Will coordinate with Alex and let you know when this is updated. Thanks!

@j201
Copy link
Contributor Author

j201 commented Oct 16, 2025

Okay, we've updated the Pypi package and this PR with a version that allows scikit-learn>=1.5.0

@LennartPurucker
Copy link
Collaborator

LennartPurucker commented Oct 20, 2025

@j201 @anthonycaterini
image

Results with HPO (200 configs random search) on TabArena-Lite, TabArena-Full is running now but will take a week or so.

Note: for the two largest datasets, I had to increase the time limit as otherwise the inference with TabDPT with ensembling takes more than 1 hour. We can already see this in the (not final in terms of style) Parteo front plots.

image

@LennartPurucker
Copy link
Collaborator

Result artifacts for all data on TabArena-Full can be found here for now: https://data.lennart-purucker.com/tabarena/leaderboard_submissions/

Results on TabArena-Full:
image

@LennartPurucker LennartPurucker merged commit eb9a987 into autogluon:main Oct 28, 2025
1 check passed
@Innixma
Copy link
Collaborator

Innixma commented Oct 28, 2025

Awesome stuff!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants