-
Notifications
You must be signed in to change notification settings - Fork 28
Update TabDPT code version, add search space, and improve hyperparameters #218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Additional changes: - Load TabDPT from PyPI instead of Github - Remove workaround for batch size 1 - Use new default download method (huggingface-hub) instead of custom caching code
|
Heyho @j201, great to see and thank you very much for the integration! I will take a closer look at the PR with feedback as soon as I can. Regarding running TabArena-full on your end, see the following: #176 (comment) If you have results for TabArena-Lite you want to share, I would be happy to see them! |
|
Also, fyi: a verified version of TabDPT was on our roadmap for the next leaderboard version (deadline ca. 1-2 months). |
|
We're currently looking at adding some more hyperparameters, hoping to get those in this week.
So I've generated results for fold 0, and we're getting a lower |
@j201 For the evaluation, try: |
|
@j201 if you pull the latest mainline, you can run the following: from __future__ import annotations
from pathlib import Path
from tabrepo.nips2025_utils.tabarena_context import TabArenaContext
from tabrepo.tabarena.website_format import format_leaderboard
from tabrepo.nips2025_utils.end_to_end_single import EndToEndResultsSingle
if __name__ == '__main__':
save_path = "output_leaderboard" # folder to save all figures and tables
path_to_my_results = "tabdpt_new/hpo_results.parquet" # replace with your local path
from autogluon.common.loaders import load_pd
hpo_results_tabdpt_new = load_pd.load(path=path_to_my_results)
# differentiate new results compared to old results
hpo_results_tabdpt_new = EndToEndResultsSingle.add_prefix_to_results(results=hpo_results_tabdpt_new, prefix="NEW_")
output_path_verified = Path(save_path) / "verified"
tabarena_context = TabArenaContext(include_unverified=False)
leaderboard_verified = tabarena_context.compare(
output_dir=output_path_verified,
new_results=hpo_results_tabdpt_new,
only_valid_tasks=True, # only evaluate tasks present in new_results
)
leaderboard_website_verified = format_leaderboard(df_leaderboard=leaderboard_verified)
print(f"Verified Leaderboard:")
print(leaderboard_website_verified.to_markdown(index=False))
print("")
|
|
Okay, we've added a number of new hyperparameters and updated the search space: search_space = {
'temperature': Categorical(0.8, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.9, 1.0, 1.25, 1.5),
'context_size': Categorical(2048, 768, 256),
'permute_classes': Categorical(True, False),
'normalizer': Categorical("standard", None, "minmax", "robust", "power", "quantile-uniform", "quantile-normal", "log1p"),
'missing_indicators': Categorical(False, True),
'clip_sigma': Categorical(4, 2, 6, 8),
'feature_reduction': Categorical("pca", "subsample"),
'faiss_metric': Categorical("l2", "ip")
} |
|
Looking good! Would you say this is now in a state for me to benchmark it, or are we waiting on something in the pipeline? |
|
We don't have any more changes planned, so feel free to benchmark, I can help out if there are more changes needed though |
|
@j201 Is sklearn>1.7 a necessary minimal bound of the requirement for TabDPT? It currently conflicts with TabPFN (on pip, mainline is also on 1.8) and TabICL. |
add: new random seed logic or TabDPT
Ahh we were planning to make this change and lower the dependence of sklearn. Probably need to release again to PyPI. Will coordinate with Alex and let you know when this is updated. Thanks! |
|
Okay, we've updated the Pypi package and this PR with a version that allows |
|
Result artifacts for all data on TabArena-Full can be found here for now: https://data.lennart-purucker.com/tabarena/leaderboard_submissions/ |
|
Awesome stuff!! |




Hi, this is Alex from the TabDPT team. We've made a number of code improvements on TabDPT so we wanted to share them as well as adding the ability to do hyperparameter search.
I've tested out
generate_all_configs.pyandrun_tabarena_lite.pyon my end using TabDPT, but I wasn't able to figure out running the full range of TabArena folds/repeats, so please let me know if you run into any issues (or if there's an easy way for me to try that out).Additional changes:
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.