Skip to content

[TabArena 2025 Roadmap] NeurIPS 2025 Camera Ready and Future Work Items #213

@Innixma

Description

@Innixma

Planned items for camera-ready deadline of NeurIPS 2025 (Oct 23rd) and future plans for 2025.

Remark

As we have already updated many models since the paper submission and rebuttal, we have three different states of results. (1) the state at submission time, (2) the state when uploading to arXiv, including the promises for rerunning models we made for the rebuttal [this is the current state of the live LB], and (3) the current state of all models we ran since then and which we will publish soon. The camera-ready version will be based on state (2), and the next version of the live leaderboard will be based on state (3).

P0 (Need to have) [Oct 23rd for Paper, 1st of December for Ecosystem]

Paper

  • Finalize and add new figures (@Innixma)
    • Pareto front of Improvability and inference time
    • showing the performance over (tuning) time related to Improvability
    • Validation Overfitting plot to replace the ensemble-weight plot (or similar) + adjusting writing (@Innixma)
  • Finalize and add improvements to writing and the appendix based on feedback and our promises in the rebuttal (@LennartPurucker)
    • Ensure writing around the integration of new figures aligns with the rebuttal promises (@LennartPurucker)
  • Finalize and add improvements to writing and the appendix based on community feedback (@LennartPurucker)
  • Ensure we mention and quantify likely dataset contamination for TabDPT
  • Remove the imputed dataset for KNN from tables for performance per dataset (@atschalz) and make more straightforward somewhere that KNN has imputed results for datasets without numerical features (we only state in C1 that we drop all categorical features).
    • PR Merged to remove imputed KNN from tables (@atschalz)
    • Re-run with corrected code on the paper results (don't use new RealMLP/EBM results) (@Innixma)
    • New version of KNN preprocessing; rerun KNN; replot everything

Ecosystem

  • Finalize integration of results for new models, which we already ran since submission
    • RealMLP_GPU [Verifed]
    • EBM [Verified]
    • Mitra [Verified]
    • xRFM [Verified]
    • LimiX [Not Verified]
    • TabFlex [Not Verified]
    • Beta-TabPFN [Not Verified]
  • Update leaderboard
    • Ensure we are using the newest data from model runs that we have at the time of the update
    • Integrate Parteo front and tuning over time as plots into each leaderboard, like the main figure
    • Add support for results of unverified models
    • Add test data leakage column (boolean or %)
    • Add additional subsets: binary, multiclass, "not small data"; consider removing/renaming TabICL-data and TabPFN-data.
    • Consider removing some models from the plots in the LB (KNN, worse than RF, ...) but keep in the LB tables.
    • TabArena Rank Metric? := (rank (based on Elo) + rank based on Improvability + harmonic rank)/3
  • Add verified / unverified to existing models:
    • Verified: RealMLP, TabM, MondernNCA, xRFM, Mitra, TabPFNv2, TorchMLP, FastaiMLP
    • Unverified (technically could be verified by authors/maintainers): LightGBM, CatBoost, XGBoost, TabDPT
    • NA (IMO no need for verification as these are established baselines): KNN, Linear model, ExtraTrees, RandomForest

P1 (Nice to have)

  • More Models
    • TabICL with HPO [We got results but have not added them, need verification/confirmation from @dholzmueller]
    • RealTabPFN
    • PerpetualBoosting [We got some results, but not verified]
    • TabM version based on pip package and with varying inner seeds.
    • LimiX with HPO
    • New run of TabDPT with verification from authors [context size, ensemble usage, ...]
    • Better KNN baseline pipeline (improve preprocessing and search space), or consider removing it.
    • Verifiy / improve linear model and its HPO if possible
  • Improve User Experience
    • Polished end to end example of locally fitting a model -> evaluating on TabArena
    • Polished installation instructions & FAQ (ex: TabDPT install error)
    • Create a technical API for the relevant TabArena/TabRepo function
    • Create an onboarding page with different use cases and better step-by-step documentation (upgrade from https://github.com/TabArena/tabarena_benchmarking_examples)
  • Technical Debt

P2 (Stretch Goal)

  • Think about how to communicate the difference of HPO vs finetuning vs ICL-performance
  • Improve tree-based models further
  • Rerun all methods with varying inner seeds
    • CPU models: KNN, Linear, RF, ExtraTrees, FastaiMLP, TorchMLP, CatBoost, LightGBM, XGBoost
    • GPU models: Beta-TabPFN, ModernNCA (HPO)
    • GPU models with HPO and refitting (so minor impact): TabPFNv2 (HPO), TabICL (HPO)
    • GPU models without HPO and refitting (so likely no impact): TabFlex, TabDPT
    • Done models: RealMLP_GPU, xRFM, Mitra, EBM, LimiX, TabM (assuming we run the pip-version as stated above)
  • Portfolio building logic that created AutoGluon 1.4 extreme preset portfolio? (@Innixma)
  • AutoGluon high, HQIL, good, medium quality runs (@Innixma)
  • AutoGluon w/ smaller time limit (5 min, 10 min, 30 min) (@Innixma)

Below the line (Not scheduled so far)

  • Integration with AMLB for other AutoML system results and support for systems/agents

Metadata

Metadata

Labels

documentationImprovements or additions to documentation

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions