Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: Compact set of most diverse distributions to try on a limited budget #81

Open
fingoldo opened this issue Mar 28, 2023 · 2 comments

Comments

@fingoldo
Copy link

Thanks for this cool lib! Facing a problem currently of finding the best compact set of distributions to try on data of unknown nature, given a limited time/CPU budget. As it appears, many of the distributions are subsets of each other, and result in a really twin-like behavior. When compute budget is limited, it probably has no sense to check distributions that can easily give similar shapes, it would be more reasonable to try the most diverse ones first (on average). Then, out of 2 distributions with similar avg diversity, it's better to start with the one having lower average runtime. get_common_distributions() seems to not account for diversity and avg runtime. Are you interested in research or PR resulting in a new function like get_efficient_distributions(n:int=3) that returns n most diverse and fast-to-compute distributions, on average?

@cokelaer
Copy link
Owner

cokelaer commented Aug 8, 2023

@fingoldo YES ! I'm interested. You are completely right. The set of distributions is redundant and having a subset that is representative would be very useful. If in addition, it is supported by a good algorithm a very good addition. Please, if still interesting, try to put a PR, we'll review it and integrate within fitter.

@fingoldo
Copy link
Author

@fingoldo YES ! I'm interested. You are completely right. The set of distributions is redundant and having a subset that is representative would be very useful. If in addition, it is supported by a good algorithm a very good addition. Please, if still interesting, try to put a PR, we'll review it and integrate within fitter.

Thanks. My research is conducted but not published yet. Did quite a lot of computations, ~ 24 hrs with 16 cores )
I'll try to prepare a publication in the coming month, but for now, quick result is that 3 most universal distributions that, taken together, can approximate well the highest number of other distributions and are reasonable fast to compute, are stats.levy_l, stats.logistic, stats.pareto.

You may extract more info from details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants