Skip to content

Conversation

@mydatascience
Copy link
Collaborator

@mydatascience mydatascience commented Oct 14, 2025

Description

Add tests for e2e grpo & sft

Tests

Added a separate TPU test for grpo

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed.

@mydatascience mydatascience changed the title GRPO E2E test GRPO & SFT E2E tests Oct 21, 2025
@mydatascience mydatascience force-pushed the grpo_sft_tests branch 2 times, most recently from 8a115f5 to c9f10d2 Compare October 24, 2025 15:25
Signed-off-by: Vladimir Suvorov <[email protected]>

Fix

Signed-off-by: Vladimir Suvorov <[email protected]>

Fix

Signed-off-by: Vladimir Suvorov <[email protected]>

Fix

Signed-off-by: Vladimir Suvorov <[email protected]>

Fix

Signed-off-by: Vladimir Suvorov <[email protected]>

Fix

Signed-off-by: Vladimir Suvorov <[email protected]>

Fix

Signed-off-by: Vladimir Suvorov <[email protected]>

Fix

Signed-off-by: Vladimir Suvorov <[email protected]>

Fix

Signed-off-by: Vladimir Suvorov <[email protected]>

Need scan layers

Signed-off-by: Vladimir Suvorov <[email protected]>

Fix

Signed-off-by: Vladimir Suvorov <[email protected]>

Fix

Signed-off-by: Vladimir Suvorov <[email protected]>

Add sft test as well

Signed-off-by: Vladimir Suvorov <[email protected]>

Add sft test as well

Signed-off-by: Vladimir Suvorov <[email protected]>

Add sft test as well

Signed-off-by: Vladimir Suvorov <[email protected]>

Fix vllm

Signed-off-by: Vladimir Suvorov <[email protected]>

fix

Signed-off-by: Vladimir Suvorov <[email protected]>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need a separate file for E2E tests?
You can also run python3 -m MaxText.sft.sft_trainer MaxText/configs/sft.yml \ run_name=$RUN_NAME base_output_directory=$OUTPUT_PATH \ model_name=$MODEL_NAME load_parameters_path=$MODEL_CHECKPOINT_PATH \ hf_access_token=$HF_ACCESS_TOKEN tokenizer_path=$TOKENIZER_PATH"

XLA_PYTHON_CLIENT_MEM_FRACTION: 0.75
TF_FORCE_GPU_ALLOW_GROWTH: false
HF_TOKEN: ${{ secrets.HF_TOKEN }}
MAXTEXT_CHECKPOINT_PATH: gs://maxtext-model-checkpoints/llama3.1-8b/2025-01-23-19-04/scanned/0/items
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the instruct checkpoint: gs://maxtext-model-checkpoints/llama3.1_8b_instruct/2025-10-16/scanned/0/items

XLA_PYTHON_CLIENT_MEM_FRACTION: 0.75
TF_FORCE_GPU_ALLOW_GROWTH: false
HF_TOKEN: ${{ secrets.HF_TOKEN }}
MODEL_CHECKPOINT_PATH: gs://maxtext-model-checkpoints/llama3.1-8b/2025-01-23-19-04/scanned/0/items
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instruct checkpoint: gs://maxtext-model-checkpoints/llama3.1_8b_instruct/2025-10-16/scanned/0/items

python3 -m pip install -e . --no-dependencies
- name: Install Tunix vLLM Requirements
run: |
bash src/MaxText/examples/install_tunix_vllm_requirement.sh
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need vllm or tpu-commons to run SFT.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

draft Draft PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants