How To Scale Your Model #2

jacobaustin123 · 2025-02-03T02:21:15Z

jacobaustin123
Feb 3, 2025
Maintainer

Discussion for the introduction of "How To Scale Your Model"

jacobaustin123 · 2025-02-03T02:36:27Z

jacobaustin123
Feb 3, 2025 — with giscus
Maintainer Author

There's a comment box at the bottom of each section you can use to ask questions. We can't reply to every comment but we'll try to address as many as we can. We'd like to keep improving this book! Please ask questions or point out if we've made a mistake! Enjoy!

0 replies

codingwithsurya · 2025-02-04T04:20:11Z

codingwithsurya
Feb 4, 2025 — with giscus

Just wanted to thank you and the team for writing this! I'm trying to self-study this material and also write a blog on ML Acceleration from first principles and this has been extremely helpful to go through. Very rare to find resources (especially on JAX/TPUs!) like this! 😃

0 replies

lweitkamp · 2025-02-04T06:52:04Z

lweitkamp
Feb 4, 2025 — with giscus

In the Matrix multiplication paragraph of the Roofline paragraph, it is not explicitly mentioned why the matmul take 2BDF flops - it's because of multiply-accumulate right?

2 replies

jacobaustin123 Feb 4, 2025 — with giscus
Maintainer Author

Yes exactly, https://jax-ml.github.io/scaling-book/transformers/ has more details on this. I'll update the footnote too.

iamkrishnagupta10 Feb 5, 2025 — with giscus

Wow

Arongil · 2025-02-05T20:26:33Z

Arongil
Feb 5, 2025 — with giscus

Thank you to the authors for this wonderful book! This material is so critical for modern day ML but not gathered up into one place... almost like you AllGathered the sharded knowledge across people and place into this one lovely book: thank you! I look forward to diving in!

0 replies

chiranjib-ibm · 2025-02-06T13:01:41Z

chiranjib-ibm
Feb 6, 2025 — with giscus

Nice Narration Gana on how to Scale your model.

0 replies

matiasvillaverde · 2025-02-08T10:41:56Z

matiasvillaverde
Feb 8, 2025 — with giscus

This is a great material, I enjoyed it a lot!

Thanks for sharing

0 replies

Shua1 · 2025-02-10T04:32:21Z

Shua1
Feb 10, 2025 — with giscus

Five years ago ML had a colorful landscape of architectures — ConvNets, LSTMs, MLPs, Transformers — but now we mostly just have the Transformer

Then, does ML compiler still play an important role in optimzating performance? It seems that you guys have studied model training and inference at a very detailed level. Wondering if ML compiler, as a general tool, still applies. Please correct my wrong assumptions, if any. Thanks!

5 replies

hawkinsp Feb 10, 2025 — with giscus
Maintainer

Even in the JAX ecosystem, there isn't just one "compiler":

You can just write math and have the compiler optimize it for the hardware. You'll get good but perhaps not peak performance this way. But you can rapidly explore, say, different model architectures or sharding strategies.
You can write Pallas kernels which are a much lower level of abstraction for TPU and GPU. This can be an effective way to extract that last few percent of performance. This is still a "compiler" but a lower level one with different trade-offs.
You can always hand-write code in whatever abstraction your hardware provides (e.g. CUDA on GPU) and call it from the compiled code.

If your model is completely fixed and you don't want to, say, explore variations on that architecture, or even, say, different ways to map that model onto hardware (e.g., sharding strategies), you do not need any form of compiler and you may as well hand-optimize the exact set of computations in your model. And indeed, we can see that working very well in specialized software packages for LLM inference. If you do want to explore such things, then a compiler can be a big asset.

jacobaustin123 Feb 10, 2025
Maintainer Author

To basically reiterate what Peter says, even if your architecture is mostly fixed, you may want to try some tweaks and changes here and there. If making those small changes requires a deep understanding of this hand-written perfectly optimized Pallas code, that can get in the way of progress (but it's a tradeoff)!

Shua1 Feb 11, 2025

Good points! What's the typical balance of compiler vs hand-write code for published models running in production? Assuming those models are fixed and high qps.

Shua1 Feb 11, 2025

What's the typical balance...
That's not a good question. I'm really just wondering how much hand written code has to be done for models running in production.

jacobaustin123 Feb 20, 2025 — with giscus
Maintainer Author

Speaking with some deliberate vagueness, I would say that hand-written code tends to be dropped in for many of the most performance critical pieces (self-attention and the actual matmul kernels), but most other parts (which are easier for the compiler) can be done with standard JAX code.

surojitiitg · 2025-02-14T22:49:18Z

surojitiitg
Feb 14, 2025 — with giscus

How long on an average will it take to go over the material for someone with basic understanding of ML?

0 replies

iamkrishnagupta10 · 2025-02-15T13:56:28Z

iamkrishnagupta10
Feb 15, 2025

Long . Very long

…

On Fri, Feb 14, 2025, 2:49 p.m. surojitiitg ***@***.***> wrote: How long on an average will it take to go over the material for someone with basic understanding of ML? — Reply to this email directly, view it on GitHub <#2 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/A67B4TQYDPWTXW3C5XZSSHL2PZXIJAVCNFSM6AAAAABWLB5ASCVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTEMRQGYZDCOA> . You are receiving this because you commented.Message ID: ***@***.***>

0 replies

klazizpro · 2025-02-18T17:59:09Z

klazizpro
Feb 18, 2025 — with giscus

can't thank you enough for this informative paper

0 replies

sagaxdotai · 2025-02-19T19:53:33Z

sagaxdotai
Feb 19, 2025 — with giscus

Amazing write up, looking forward to the book.

0 replies

anujgo · 2025-02-20T21:05:35Z

anujgo
Feb 20, 2025 — with giscus

This is an incredibly well written and useful book and learning resource. Thank you Google and all authors for publishing this. One somall suggestion I have is the book, chapters and sometimes a give "page" uses different TPU versions (4/5e/5p/6p, etc) as examples. It would make reading simpler if you consistently used one. This allows the reader to build some memory on some of those numbers from paragraph to paragraph. Having a table that shows various TPUs and GPUs is still handy, but for examples to illustrate a point, maybe pick one?

Similarly, in the inference section, you mix between Llama 70b and 13b for examples. It maybe be equally effective to just use one. And maybe a table for different model hyper parameters for common/popular ones?

Regardless, this is very good book, Thank you!

1 reply

jacobaustin123 Feb 20, 2025
Maintainer Author

Hi! I definitely get this, is there a section that's particular bad re: different TPU versions. My goal was mainly to use 4p and 5e (for training and inference respectively) but there must be some places I failed at that.

re: different LLAMAs, I'll look at this. Probably standardizing on 70B isn't a bad idea.

anujgo · 2025-02-20T22:55:49Z

anujgo
Feb 20, 2025

The section on rooflines starts with v6e for BW, then v5e for flops The section on TPUs starts with v5e for some examples, explains networking differences across different TPUs (which makes sense and is useful) but then uses v5p for ICI/HBM bandwidth ratio example. How to parallelize transformer for training used v5p for # TP U for adam data parallelism etc. From: Jacob Austin ***@***.***> Date: Thursday, February 20, 2025 at 2:17 PM To: jax-ml/scaling-book ***@***.***> Cc: anujgo ***@***.***>, Comment ***@***.***> Subject: Re: [jax-ml/scaling-book] How To Scale Your Model (Discussion #2) Hi! I definitely get this, is there a section that's particular bad re: different TPU versions. My goal was mainly to use 4p and 5e (for training and inference respectively) but there must be some places I failed at that. re: different LLAMAs, I'll look at this. Probably standardizing on 70B isn't a bad idea. — Reply to this email directly, view it on GitHub<#2 (reply in thread)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ACL5UTV4ZZPO3HKTFHABBTD2QZH65AVCNFSM6AAAAABWLB5ASCVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTEMRWHEYDSNI>. You are receiving this because you commented.Message ID: ***@***.***>

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How To Scale Your Model #2

{{title}}

Replies: 13 comments 8 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

How To Scale Your Model #2

jacobaustin123 Feb 3, 2025 Maintainer

Replies: 13 comments · 8 replies

jacobaustin123 Feb 3, 2025 — with giscus Maintainer Author

codingwithsurya Feb 4, 2025 — with giscus

lweitkamp Feb 4, 2025 — with giscus

jacobaustin123 Feb 4, 2025 — with giscus Maintainer Author

iamkrishnagupta10 Feb 5, 2025 — with giscus

Arongil Feb 5, 2025 — with giscus

chiranjib-ibm Feb 6, 2025 — with giscus

matiasvillaverde Feb 8, 2025 — with giscus

Shua1 Feb 10, 2025 — with giscus

hawkinsp Feb 10, 2025 — with giscus Maintainer

jacobaustin123 Feb 10, 2025 Maintainer Author

Shua1 Feb 11, 2025

Shua1 Feb 11, 2025

jacobaustin123 Feb 20, 2025 — with giscus Maintainer Author

surojitiitg Feb 14, 2025 — with giscus

iamkrishnagupta10 Feb 15, 2025

klazizpro Feb 18, 2025 — with giscus

sagaxdotai Feb 19, 2025 — with giscus

anujgo Feb 20, 2025 — with giscus

jacobaustin123 Feb 20, 2025 Maintainer Author

anujgo Feb 20, 2025

jacobaustin123
Feb 3, 2025
Maintainer

Replies: 13 comments 8 replies

jacobaustin123
Feb 3, 2025 — with giscus
Maintainer Author

codingwithsurya
Feb 4, 2025 — with giscus

lweitkamp
Feb 4, 2025 — with giscus

jacobaustin123 Feb 4, 2025 — with giscus
Maintainer Author

Arongil
Feb 5, 2025 — with giscus

chiranjib-ibm
Feb 6, 2025 — with giscus

matiasvillaverde
Feb 8, 2025 — with giscus

Shua1
Feb 10, 2025 — with giscus

hawkinsp Feb 10, 2025 — with giscus
Maintainer

jacobaustin123 Feb 10, 2025
Maintainer Author

jacobaustin123 Feb 20, 2025 — with giscus
Maintainer Author

surojitiitg
Feb 14, 2025 — with giscus

iamkrishnagupta10
Feb 15, 2025

klazizpro
Feb 18, 2025 — with giscus

sagaxdotai
Feb 19, 2025 — with giscus

anujgo
Feb 20, 2025 — with giscus

jacobaustin123 Feb 20, 2025
Maintainer Author

anujgo
Feb 20, 2025