question: Understanding the relationship between get_split_value_histogram
and trees_to_dataframe
#11220
Labels
get_split_value_histogram
and trees_to_dataframe
#11220
Hi, thanks for the great library!
I am not sure if this is a bug or more of a misunderstanding on my part, but I am struggling to resolve some differences between the output of
get_split_value_histogram
andtrees_to_dataframe
. To my understanding, it should be possible to get the splits XGBoost uses for each feature from either method. However, I am getting drastically different results. As an example, here is a model with a float feature and some boolean features.Note that passing in a boolean column works, but breaks both
get_split_value_histogram
andtrees_to_dataframe
(already reported in #10437). Looking at "int_bool",I get outputs of
and
respectively. According to
get_split_value_histogram
, there is a trivial split on 1.5, whereastrees_to_dataframe
seems to report a more accurate split on 1.0. Where does the 1.5 come from?Looking at the float feature,
I get
and
respectively. According to
get_split_value_histogram
, there are 178 unique splits, and according totrees_to_dataframe
, there are 220 unique splits. In addition, the actual split values are fairly different between the two.I would expect that the two functions return the same results.
The text was updated successfully, but these errors were encountered: