@@ -220,18 +220,22 @@ def allowed_token_ids(self, state: FSMState) -> List[int]:
220
220
"""Generate a list of allowed tokens for the next step.
221
221
222
222
Upon initialization, the CFG incremental parser is used to determine the
223
- first regex.
224
-
225
- This regex is used for proposals until either:
226
-
227
- - The regex is exhausted, and its only remaining option is the EOS
228
- token, in which case we always transition to the next regex
229
- - The regex can be exhausted, but the EOS token is not the only
230
- remaining option, in which case we transition to the next regex with
231
- probability P (TODO) or remove the possibility of generating the EOS
232
- token and continue with the current regex
233
-
234
- The CFG incremental parser is allowed to propose the EOS token from any final state,
223
+ first regex and construct the first FSM to generate the first terminal.
224
+
225
+ This FSM is used for proposals until either:
226
+
227
+ - The FSM is exhausted, and its only remaining option is the EOS
228
+ token, in which case we feed the generated terminal to the
229
+ CFG incremental parser and allow it to propose the next regex
230
+ corresponding to the next set of valid terminals.
231
+ - The current FSM can be exhausted, but the EOS token is not the only
232
+ remaining option. In this case we allow proposal of current terminal extensions,
233
+ store the current FSM and its state, then also use the CFG parser
234
+ to propose a new regex corresponding to terminating the current terminal
235
+ and starting the next one. The model can then sample from either of these sets
236
+ to determine whether to extend the current terminal or terminate it and start the next one.
237
+
238
+ The CFG incremental parser is allowed to propose the EOS token from any accepting state,
235
239
and once it is generated, the FSM will continue to always generate the EOS token.
236
240
237
241
Parameters
0 commit comments