How To Scale Your Model #2
Replies: 13 comments 8 replies
-
There's a comment box at the bottom of each section you can use to ask questions. We can't reply to every comment but we'll try to address as many as we can. We'd like to keep improving this book! Please ask questions or point out if we've made a mistake! Enjoy! |
Beta Was this translation helpful? Give feedback.
-
Just wanted to thank you and the team for writing this! I'm trying to self-study this material and also write a blog on ML Acceleration from first principles and this has been extremely helpful to go through. Very rare to find resources (especially on JAX/TPUs!) like this! 😃 |
Beta Was this translation helpful? Give feedback.
-
In the Matrix multiplication paragraph of the Roofline paragraph, it is not explicitly mentioned why the matmul take 2BDF flops - it's because of multiply-accumulate right? |
Beta Was this translation helpful? Give feedback.
-
Thank you to the authors for this wonderful book! This material is so critical for modern day ML but not gathered up into one place... almost like you AllGathered the sharded knowledge across people and place into this one lovely book: thank you! I look forward to diving in! |
Beta Was this translation helpful? Give feedback.
-
Nice Narration Gana on how to Scale your model. |
Beta Was this translation helpful? Give feedback.
-
This is a great material, I enjoyed it a lot! Thanks for sharing |
Beta Was this translation helpful? Give feedback.
-
Then, does ML compiler still play an important role in optimzating performance? It seems that you guys have studied model training and inference at a very detailed level. Wondering if ML compiler, as a general tool, still applies. Please correct my wrong assumptions, if any. Thanks! |
Beta Was this translation helpful? Give feedback.
-
How long on an average will it take to go over the material for someone with basic understanding of ML? |
Beta Was this translation helpful? Give feedback.
-
Long . Very long
…On Fri, Feb 14, 2025, 2:49 p.m. surojitiitg ***@***.***> wrote:
How long on an average will it take to go over the material for someone
with basic understanding of ML?
—
Reply to this email directly, view it on GitHub
<#2 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A67B4TQYDPWTXW3C5XZSSHL2PZXIJAVCNFSM6AAAAABWLB5ASCVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTEMRQGYZDCOA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
can't thank you enough for this informative paper |
Beta Was this translation helpful? Give feedback.
-
Amazing write up, looking forward to the book. |
Beta Was this translation helpful? Give feedback.
-
This is an incredibly well written and useful book and learning resource. Thank you Google and all authors for publishing this. One somall suggestion I have is the book, chapters and sometimes a give "page" uses different TPU versions (4/5e/5p/6p, etc) as examples. It would make reading simpler if you consistently used one. This allows the reader to build some memory on some of those numbers from paragraph to paragraph. Having a table that shows various TPUs and GPUs is still handy, but for examples to illustrate a point, maybe pick one? Similarly, in the inference section, you mix between Llama 70b and 13b for examples. It maybe be equally effective to just use one. And maybe a table for different model hyper parameters for common/popular ones? Regardless, this is very good book, Thank you! |
Beta Was this translation helpful? Give feedback.
-
The section on rooflines starts with v6e for BW, then v5e for flops
The section on TPUs starts with v5e for some examples, explains networking differences across different TPUs (which makes sense and is useful) but then uses v5p for ICI/HBM bandwidth ratio example.
How to parallelize transformer for training used v5p for # TP U for adam data parallelism
etc.
From: Jacob Austin ***@***.***>
Date: Thursday, February 20, 2025 at 2:17 PM
To: jax-ml/scaling-book ***@***.***>
Cc: anujgo ***@***.***>, Comment ***@***.***>
Subject: Re: [jax-ml/scaling-book] How To Scale Your Model (Discussion #2)
Hi! I definitely get this, is there a section that's particular bad re: different TPU versions. My goal was mainly to use 4p and 5e (for training and inference respectively) but there must be some places I failed at that.
re: different LLAMAs, I'll look at this. Probably standardizing on 70B isn't a bad idea.
—
Reply to this email directly, view it on GitHub<#2 (reply in thread)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ACL5UTV4ZZPO3HKTFHABBTD2QZH65AVCNFSM6AAAAABWLB5ASCVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTEMRWHEYDSNI>.
You are receiving this because you commented.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Discussion for the introduction of "How To Scale Your Model"
Beta Was this translation helpful? Give feedback.
All reactions