You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for this cool lib! Facing a problem currently of finding the best compact set of distributions to try on data of unknown nature, given a limited time/CPU budget. As it appears, many of the distributions are subsets of each other, and result in a really twin-like behavior. When compute budget is limited, it probably has no sense to check distributions that can easily give similar shapes, it would be more reasonable to try the most diverse ones first (on average). Then, out of 2 distributions with similar avg diversity, it's better to start with the one having lower average runtime. get_common_distributions() seems to not account for diversity and avg runtime. Are you interested in research or PR resulting in a new function like get_efficient_distributions(n:int=3) that returns n most diverse and fast-to-compute distributions, on average?
The text was updated successfully, but these errors were encountered:
@fingoldo YES ! I'm interested. You are completely right. The set of distributions is redundant and having a subset that is representative would be very useful. If in addition, it is supported by a good algorithm a very good addition. Please, if still interesting, try to put a PR, we'll review it and integrate within fitter.
@fingoldo YES ! I'm interested. You are completely right. The set of distributions is redundant and having a subset that is representative would be very useful. If in addition, it is supported by a good algorithm a very good addition. Please, if still interesting, try to put a PR, we'll review it and integrate within fitter.
Thanks. My research is conducted but not published yet. Did quite a lot of computations, ~ 24 hrs with 16 cores )
I'll try to prepare a publication in the coming month, but for now, quick result is that 3 most universal distributions that, taken together, can approximate well the highest number of other distributions and are reasonable fast to compute, are stats.levy_l, stats.logistic, stats.pareto.
Thanks for this cool lib! Facing a problem currently of finding the best compact set of distributions to try on data of unknown nature, given a limited time/CPU budget. As it appears, many of the distributions are subsets of each other, and result in a really twin-like behavior. When compute budget is limited, it probably has no sense to check distributions that can easily give similar shapes, it would be more reasonable to try the most diverse ones first (on average). Then, out of 2 distributions with similar avg diversity, it's better to start with the one having lower average runtime. get_common_distributions() seems to not account for diversity and avg runtime. Are you interested in research or PR resulting in a new function like get_efficient_distributions(n:int=3) that returns n most diverse and fast-to-compute distributions, on average?
The text was updated successfully, but these errors were encountered: