Commit bde41d6
authored
Correctly drop tokens in SwitchTransformer (#37123)
Previously, the identity function was used for dropped tokens
with a weight from the expert that was not applied to the hidden states.
This was misleading, because dropping means, the expert weight is zero.
Instead of trying to fix the weight, we take an easier approach by initializing with zeros.
Fixes issue #370171 parent 7ecc5b8 commit bde41d6
File tree
2 files changed
+16
-4
lines changed- src/transformers/models/switch_transformers
- tests/models/switch_transformers
2 files changed
+16
-4
lines changedLines changed: 2 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
301 | 301 | | |
302 | 302 | | |
303 | 303 | | |
304 | | - | |
305 | | - | |
306 | | - | |
307 | | - | |
| 304 | + | |
| 305 | + | |
308 | 306 | | |
309 | 307 | | |
310 | 308 | | |
| |||
Lines changed: 14 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
42 | 42 | | |
43 | 43 | | |
44 | 44 | | |
| 45 | + | |
45 | 46 | | |
46 | 47 | | |
47 | 48 | | |
| |||
1133 | 1134 | | |
1134 | 1135 | | |
1135 | 1136 | | |
| 1137 | + | |
| 1138 | + | |
| 1139 | + | |
| 1140 | + | |
| 1141 | + | |
| 1142 | + | |
| 1143 | + | |
| 1144 | + | |
| 1145 | + | |
| 1146 | + | |
| 1147 | + | |
| 1148 | + | |
| 1149 | + | |
0 commit comments