3rd paragraph of TTFT section: "A 1K token prompt on a 32B model takes about 3 seconds. A 64K token prompt on a 405B model takes over an hour. Same model, same hardware, completely different experience." I don't understand "same model".
(Context: THANK YOU for making this product of your knowledge and explanatory skills available to me!)
3rd paragraph of TTFT section: "A 1K token prompt on a 32B model takes about 3 seconds. A 64K token prompt on a 405B model takes over an hour. Same model, same hardware, completely different experience." I don't understand "same model".
(Context: THANK YOU for making this product of your knowledge and explanatory skills available to me!)