Skip to content

Commit d03c9cb

Browse files
committed
[*] pre-commit run --all-files
1 parent 210e9d7 commit d03c9cb

File tree

6 files changed

+267
-138
lines changed

6 files changed

+267
-138
lines changed

.pre-commit-config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ repos:
99
args:
1010
- '-w'
1111
- '--skip="*.txt,pylintrc,.*,src/MaxText/assets/*"'
12-
- '-L ND,nd,sems,TE,ROUGE,rouge,astroid'
12+
- '-L ND,nd,sems,TE,ROUGE,rouge,astroid,ags,dout'
1313
- '.'
1414
additional_dependencies:
1515
- tomli

benchmarks/api_server/server_models.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@ class CompletionRequest(SamplingParams):
6969
logprobs: Optional[int] = None
7070

7171
@field_validator("logprobs")
72-
def validate_logprobs(cls, v):
72+
def validate_logprobs(self, v):
7373
if v is not None and v < 0:
7474
raise ValueError("logprobs must be a non-negative integer if provided.")
7575
return v

docs/tutorials/grpo_with_pathways.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -66,5 +66,5 @@ The overview of the demo script ~/maxtext/src/MaxText/examples/grpo_llama3_1_70b
6666

6767
1. We load a policy model and a reference model. Both are copies of `Llama3.1-70b-Instruct`.
6868
2. Evaluate the policy model's performance on GSM8K math reasoning benchmark.
69-
3. Train the policy model using GRPO with potentially different meshes for trainer and rollout dependending on the parameters `TRAINER_DEVICES_FRACTION` and `SAMPLER_DEVICES_FRACTION`. If we set both of these to `1.0`, the entire (same) mesh will be used for both trainer and rollout. If we set say `TRAINER_DEVICES_FRACTION=0.5` and `SAMPLER_DEVICES_FRACTION=0.5`, the first half of the devices will be used for trainer and the second half will be used for rollout
69+
3. Train the policy model using GRPO with potentially different meshes for trainer and rollout depending on the parameters `TRAINER_DEVICES_FRACTION` and `SAMPLER_DEVICES_FRACTION`. If we set both of these to `1.0`, the entire (same) mesh will be used for both trainer and rollout. If we set say `TRAINER_DEVICES_FRACTION=0.5` and `SAMPLER_DEVICES_FRACTION=0.5`, the first half of the devices will be used for trainer and the second half will be used for rollout
7070
4. Evaluate the policy model's performance on GSM8K math reasoning benchmark after the post-training with GRPO.

0 commit comments

Comments
 (0)