Draw representative sample distribution from built data

The current data build rules are to take a certain number of samples from each training data file, defined by the max number of samples and max data size.

Imagine you have categorical responses with disproportionate representation (well, this is what I actually have). Simply weighting the responses according to their proportion works relatively well, as currently implemented. However, it seems that the weightings are not sufficient if the imbalance is extreme enough.

One solution could be to draw representative samples from the training dataset to even out the response distribution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draw representative sample distribution from built data #100

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Draw representative sample distribution from built data #100

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions