MLGRU 

From the computational formula of MLGRU, it is observed that the parallelism between tokens is disrupted during the prefill phase, whereas Transformer++ is able to maintain the parallelism between tokens, and I have two questions：
1. latency in Figure 4->(d) means First token latency? 
2. And  in Figure 4->(d) , Transfomer++ utilizes token parallelism？