[Feature Request]: Compact set of most diverse distributions to try on a limited budget #81

fingoldo · 2023-03-28T17:16:36Z

Thanks for this cool lib! Facing a problem currently of finding the best compact set of distributions to try on data of unknown nature, given a limited time/CPU budget. As it appears, many of the distributions are subsets of each other, and result in a really twin-like behavior. When compute budget is limited, it probably has no sense to check distributions that can easily give similar shapes, it would be more reasonable to try the most diverse ones first (on average). Then, out of 2 distributions with similar avg diversity, it's better to start with the one having lower average runtime. get_common_distributions() seems to not account for diversity and avg runtime. Are you interested in research or PR resulting in a new function like get_efficient_distributions(n:int=3) that returns n most diverse and fast-to-compute distributions, on average?

cokelaer · 2023-08-08T20:57:58Z

@fingoldo YES ! I'm interested. You are completely right. The set of distributions is redundant and having a subset that is representative would be very useful. If in addition, it is supported by a good algorithm a very good addition. Please, if still interesting, try to put a PR, we'll review it and integrate within fitter.

fingoldo · 2023-08-10T17:07:24Z

@fingoldo YES ! I'm interested. You are completely right. The set of distributions is redundant and having a subset that is representative would be very useful. If in addition, it is supported by a good algorithm a very good addition. Please, if still interesting, try to put a PR, we'll review it and integrate within fitter.

Thanks. My research is conducted but not published yet. Did quite a lot of computations, ~ 24 hrs with 16 cores )
I'll try to prepare a publication in the coming month, but for now, quick result is that 3 most universal distributions that, taken together, can approximate well the highest number of other distributions and are reasonable fast to compute, are stats.levy_l, stats.logistic, stats.pareto.

You may extract more info from details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request]: Compact set of most diverse distributions to try on a limited budget #81

[Feature Request]: Compact set of most diverse distributions to try on a limited budget #81

fingoldo commented Mar 28, 2023

cokelaer commented Aug 8, 2023

fingoldo commented Aug 10, 2023

[Feature Request]: Compact set of most diverse distributions to try on a limited budget #81

[Feature Request]: Compact set of most diverse distributions to try on a limited budget #81

Comments

fingoldo commented Mar 28, 2023

cokelaer commented Aug 8, 2023

fingoldo commented Aug 10, 2023