The current data build rules are to take a certain number of samples from each training data file, defined by the max number of samples and max data size.
Imagine you have categorical responses with disproportionate representation (well, this is what I actually have). Simply weighting the responses according to their proportion works relatively well, as currently implemented. However, it seems that the weightings are not sufficient if the imbalance is extreme enough.
One solution could be to draw representative samples from the training dataset to even out the response distribution.
The current data build rules are to take a certain number of samples from each training data file, defined by the max number of samples and max data size.
Imagine you have categorical responses with disproportionate representation (well, this is what I actually have). Simply weighting the responses according to their proportion works relatively well, as currently implemented. However, it seems that the weightings are not sufficient if the imbalance is extreme enough.
One solution could be to draw representative samples from the training dataset to even out the response distribution.