|
| 1 | +# Class Overlap |
| 2 | + |
| 3 | +Class overlap assesses the semantic overlap between pairs of classes. In some cases, high |
| 4 | +overlap may be associated with poor class definitions, mislabelling, and/or model confusion. |
| 5 | + |
| 6 | +Class overlap is determined with a dataset alone, based on the locations of utterances in |
| 7 | +embedding space, as described in |
| 8 | +[:material-link: Similarity Analysis](../key-concepts/similarity.md). |
| 9 | + |
| 10 | +## Class Overlap Plot |
| 11 | + |
| 12 | +The Class Overlap plot shows the extent to which source classes semantically overlap target |
| 13 | +classes, all in the training data. The source class is the class label, and the target class is |
| 14 | +the class that the source class may look like, based on its nearest neighbors. As such, flows |
| 15 | +between class nodes indicate whether samples in a source class are in neighborhoods typified |
| 16 | +by other classes (class overlap) or its own class (self overlap). For each source class, class |
| 17 | +overlap and self-overlap values sum to 1, unless values are scaled by class size. |
| 18 | + |
| 19 | +Overlap is displayed as flows from source class (nodes on the left) to target classes (right). |
| 20 | +Nodes are ordered with flows for greatest overlap values towards the top, so as to highlight these |
| 21 | +class pairs. Wider flows indicate greater overlap values. Colors group flows from the same |
| 22 | +source class. The plot is interactive, in that nodes can be moved and reordered via dragging. |
| 23 | + |
| 24 | +### Plot options |
| 25 | + |
| 26 | +* **Minimum displayed overlap value**: This value determines which overlap flows will be displayed |
| 27 | + on the plot. Vary this value to focus on class pairs with greatest overlap, or to see all |
| 28 | + overlap to better understand the complexity of the dataset. The default value is set to the |
| 29 | + tenth-highest class overlap value for ease of visualization alone, and will differ across |
| 30 | + different datasets. |
| 31 | +* **Self-overlap**: This toggle determines whether to show flows for overlap of a class with |
| 32 | + itself, to get a sense of the relative magnitude (and possibly importance) of class overlap. |
| 33 | +* **Scale by class size**: Overlap values are normalized by source class, such that the sum of |
| 34 | + all class overlap and self-overlap values for a source class is 1. This toggle multiples overlap |
| 35 | + values by class sample sizes, changing node size and flow width accordingly. |
| 36 | + |
| 37 | +### Suggested workflow |
| 38 | + |
| 39 | +The plot options described above allow for exploration of different aspects of class overlap. To |
| 40 | +navigate them, we suggest the following workflow: |
| 41 | + |
| 42 | +#### 1. Default view: `Self-overlap` off, `Scale by class size` on |
| 43 | + |
| 44 | +- Start here. This view shows you the class pairs with the greatest (scaled) semantic overlap |
| 45 | + scores in the dataset. Vary the `Minimum displayed overlap value` to see all dataset overlap or |
| 46 | + to focus on the class pairs with the greatest overlap scores. |
| 47 | +- Because `Scale by class size` is on, this view will emphasize overlapping classes with greater |
| 48 | + sample counts. This is useful if you are less concerned about class overlap from |
| 49 | + source classes with few samples in the training data. However, if you want to further investigate |
| 50 | + classes with high overlap values but fewer samples, either for better understanding your dataset |
| 51 | + or because some classes might have high business value, then you can toggle `Scale by class size` |
| 52 | + to off, as explained in step 2. |
| 53 | + |
| 54 | +#### 2. Toggle `Scale by class size` off: |
| 55 | + |
| 56 | +- When `Scale by class size` is turned off, total flows (class overlap and self-overlap) sum to 1. |
| 57 | + This view emphasizes class pairs with the greatest class overlap scores, regardless of |
| 58 | + whether the source class has many samples in it. |
| 59 | +- This is useful to further understand class overlap for classes that have relatively fewer |
| 60 | + samples in them, which might not have been as visible during the analysis at step 1. |
| 61 | + |
| 62 | +#### 3. Toggle `Self-overlap` on: |
| 63 | + |
| 64 | +- For any given class, turning on `Self-overlap` lets you compare the extent to which its samples |
| 65 | + semantically overlap other classes (class overlap) vs. samples of its own class (self-overlap). |
| 66 | + For example, if self-overlap is much higher than class overlap, class overlap may be less |
| 67 | + problematic for this class, and vice versa. |
| 68 | + |
| 69 | +!!! tip |
| 70 | + |
| 71 | + :material-restart: Click the reset button next to the overlap threshold value to reset to |
| 72 | + the default threshold. |
| 73 | + |
| 74 | +<figure markdown> |
| 75 | + |
| 76 | +<figcaption> |
| 77 | +Class Overlap plot on the Class Overlap page, accessed via the Dashboard. |
| 78 | +</figcaption> |
| 79 | +</figure> |
| 80 | + |
0 commit comments