-
Notifications
You must be signed in to change notification settings - Fork 199
Open
Description
From the computational formula of MLGRU, it is observed that the parallelism between tokens is disrupted during the prefill phase, whereas Transformer++ is able to maintain the parallelism between tokens, and I have two questions:
- latency in Figure 4->(d) means First token latency?
- And in Figure 4->(d) , Transfomer++ utilizes token parallelism?
Metadata
Metadata
Assignees
Labels
No labels