First pass at LLaDA-8b implementation #19

marlinfiggins · 2025-08-08T18:15:43Z

Currently working on the run_model.py

jenriver · 2025-08-08T21:42:19Z

Thanks Marlin, let's wait for the run_model.py and config test grid until we merge the PR.

jenriver

Thanks for the contribution! Added some comments.

jenriver · 2025-09-03T22:16:56Z

bonsai/models/llada_8b/tests/run_model.py

+    return logits, state
+
+
+out_tokens, state_final = modeling.generate(


Why are there 3 differentmodeling.generate runs of different gen/block lengths?

Also, is this a warmup run? If so, one step would suffice.

Having step in an explicit for loop like Qwen3 would make this much simpler.

RE: Also, is this a warmup run? If so, one step would suffice.

To be more specific, a jit-compiled function with non static_argnums set as steps should suffice with a warmup run of steps=1, as long as the function is not compiled again. This is another reason it'd be better to separate steps outside of this generate function in an explicit for loop.

bonsai/models/llada_8b/modeling.py

jenriver · 2025-09-03T22:30:29Z

bonsai/models/llada_8b/modeling.py

+
+
+class BlockType(str, Enum):
+    """


Please remove unnecessary whitespace. Also let's stick with either enum.auto() or str to keep consistency throughout the entire file.

jenriver · 2025-09-03T22:31:44Z

bonsai/models/llada_8b/modeling.py

+
+    if ln_type == LayerNormType.GEMMA_RMS:
+        return GemmaRMSNorm(
+            dim,


Can fit in oneline with 120char limit, please maximize 120char limit. Ditto all else where applicable, including before ValueError.

return GemmaRMSNorm(dim, epsilon=eps, use_bias=use_bias, rngs=rngs).

Please also remove following unnecessary whitespace all where applicable.

jenriver · 2025-09-03T22:35:00Z

bonsai/models/llada_8b/tests/run_model.py

+# Tokenization helper
+def tokenize(tokenizer, inputs: list[str]):
+    pad_id = tokenizer.pad_token_id
+    if pad_id is None:


how about pad_id = pad_id or tokenizer.eos_token_id # fall back to eos

jenriver · 2025-09-03T22:36:03Z

bonsai/models/llada_8b/tests/run_model.py

+# Model + params
+cfg = modeling.ModelConfig.llada_8b_instruct()
+
+# Load weights into a constructed model


Remove self explanatory comments, ditto all else where applicable

jenriver · 2025-09-03T22:46:00Z

bonsai/models/llada_8b/tests/run_model.py

+graphdef, state = nnx.split(model)
+
+
+def model_step(tokens, state):


Can we just have a single forward which is jitted and has all the key logics in generate? Similar to what we had in Qwen3.

jenriver · 2025-09-03T22:48:04Z

bonsai/models/llada_8b/modeling.py

+    ],
+)
+def generate(
+    model_step,


Pleaes add typehints all where applicable. However we want to factor out the step here as in other comments.

jenriver · 2025-09-03T22:49:18Z

bonsai/models/llada_8b/modeling.py

+        return (x, state, rng), None
+
+    (x, final_state, _), _ = lax.scan(do_block, (x, init_state, rng), xs=jnp.arange(num_blocks, dtype=jnp.int32))
+    return x, final_state


Can we refactor the generate steps out into an explicit for loop? Something like what we have in Qwen3.

Current implemenation makes it difficult to benchmark performance per step and separate static argnums from jax.

jenriver · 2025-09-03T22:54:05Z

bonsai/models/llada_8b/tests/run_model.py

+jax.profiler.stop_trace()
+
+# Profile a single run, print FLOPs, memory
+jax.profiler.profile(


Is this a valid method? I don't think it exists here
https://github.com/jax-ml/jax/blob/52383198d74ffdc3b5c3bef86cfd64f02ab9dbb5/jax/profiler.py#L17

jenriver · 2025-09-03T23:01:44Z

bonsai/models/llada_8b/tests/run_model.py

+
+
+# Demo queries
+queries = [


Could we test with multiple queries of varying size to ensure batch is correct?

ex:
queries = ["Why is the sky blue instead of any other color like purple?", "Who am I?"]

jenriver · 2025-09-03T23:08:52Z

bonsai/models/llada_8b/tests/run_model.py

+    num_traced_runs=1,
+    num_profiled_runs=1,
+)
+print("Generated token IDs:", out_tokens)


Could we decode the output as this is an example model run to test whether the output makes sense?

Example:

https://github.com/jax-ml/bonsai/blob/main/bonsai/models/qwen3/tests/run_model.py

bonsai/models/llada_8b/modeling.py

chapman20j · 2025-09-06T00:05:42Z

bonsai/models/llada_8b/modeling.py

+
+class LLaDAOutput(NamedTuple):
+    logits: jax.Array
+    """


Can simplify comments in this class.

Renaming

jenriver · 2025-10-06T17:53:32Z

Can you also add a quality logits checker (ex: test_outputs.py) to verify that the resulting logits are correct?

jenriver · 2025-10-13T17:58:27Z

Hello, please make sure to have a test_outputs.py to make sure the implementations above are correct.

Ref:

jenriver · 2025-10-20T19:36:37Z

bonsai/models/llada_8b/tests/test_outputs.py

+HF_REPO = "GSAI-ML/LLaDA-8B-Instruct"
+
+
+def _tokenize_chat_batch(tokenizer, prompts):


Please keep consistency with tokenize in run_model.py as discussed offline.

Otherwise this PR looks ready.

jenriver

Thanks for this implementation!

Looks mostly good, minor stylistic changes can be addressed in follow-up CL's.

First pass at LLaDA-8b implementation

ff0fa7a

marlinfiggins added 2 commits August 25, 2025 10:54

Test script

cf7eae5

Cleaning up generation; MDM sampling.

f82d2f6

jenriver requested review from chapman20j and jenriver September 3, 2025 22:14

jenriver reviewed Sep 3, 2025

View reviewed changes

chapman20j reviewed Sep 5, 2025

View reviewed changes

bonsai/models/llada_8b/modeling.py Outdated Show resolved Hide resolved

chapman20j reviewed Sep 5, 2025

View reviewed changes

bonsai/models/llada_8b/modeling.py Outdated Show resolved Hide resolved

chapman20j reviewed Sep 6, 2025

View reviewed changes

marlinfiggins added 14 commits September 8, 2025 11:43

Addressing modeling comments

36cf0e7

Addressing test comments (HW Tests to come)

1f8a4a2

Not donating state to prevent deletion errors

b33c507

k-q norm swap

461c504

No longer carrying state

3b25d31

Not carrying state; updating defaults and decoding

9ebd2f3

bfloat16 defaults

113d2c4

Cleaning up generation

9aef0fc

Changing RMS implementation to match LLADA

4f42bfd

dtype in attention masks

9f233e5

Reverting topk

bd028bf

Cleaning up LayerNorm

da1c428

Renaming

Breaking out into single steps; updated prompt examples

24562ee

Fixing run script

bd79681

Correcting download behavior

271fc06

marlinfiggins added 8 commits October 20, 2025 09:53

Typo in GemmaRMS

cd0d2bb

Updated RotaryEmbedding

3fbcd19

Removing unused cache

d485622

Removing unused imports

fb41059

Testing outputs against reference

e07c4cd

Adding README

9d0d571

TPU support

5d25e86

Matching generation parameters of original generate.py

175959a

marlinfiggins force-pushed the LLaDA-8B branch from d6abf89 to 175959a Compare October 20, 2025 17:33

marlinfiggins added 3 commits October 20, 2025 10:39

Replacing pad_token_id with the mask token directly.

8f61112

Adding mask token

cdea3eb

Merge branch 'main' into LLaDA-8B

43560ca

jenriver reviewed Oct 20, 2025

View reviewed changes

Parity between tokenizing function

c62f296

jenriver approved these changes Oct 20, 2025

View reviewed changes

Merge branch 'main' into LLaDA-8B

dc92884

jenriver merged commit 983f756 into main Oct 20, 2025
1 check passed

		return logits, state


		out_tokens, state_final = modeling.generate(

		graphdef, state = nnx.split(model)


		def model_step(tokens, state):

		HF_REPO = "GSAI-ML/LLaDA-8B-Instruct"


		def _tokenize_chat_batch(tokenizer, prompts):

First pass at LLaDA-8b implementation #19

First pass at LLaDA-8b implementation #19

Conversation

marlinfiggins commented Aug 8, 2025

Uh oh!

jenriver commented Aug 8, 2025

Uh oh!

jenriver left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jenriver commented Oct 6, 2025

Uh oh!

jenriver commented Oct 13, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jenriver left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants