-
Notifications
You must be signed in to change notification settings - Fork 419
pyconfig → pydantic #1836
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
pyconfig → pydantic #1836
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM pending line lengths which make it hard to read
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do the file name letters stand for? types_j, types_g, etc.
@bvandermoon Oh ignore that. Each is a different attempt (in lexicographical order). All will be removed with a singular |
5683f77 to
2f4bd2b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
11fec39 to
08848be
Compare
a736a40 to
c4f0de6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM, please also add the additional benchmark runner tests you mentioned offline in the PR description
e3e08dd to
86d8519
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM outside of this comment: https://github.com/AI-Hypercomputer/maxtext/pull/1836/files#r2470963833
8baac66 to
ee2ee19
Compare
…configuration files in MaxText ; [src/MaxText/pyconfig.py] New temporary wrapper to not break existing API ; [src/MaxText/pyconfig_og.py] Move original version here ; [src/MaxText/configs/__init__.py] Make this a module ; [tests/pyconfig_test.py] Import from og pyconfig ; [*requirements*.txt] Add pydantic requirement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @SamuelMarks for the offline testing. Discussed offline, it looks like the base.yml configs are being picked up, but the model-specific configs are not. Temporarily removing the Pull Ready tag so we can sort this out
Description
Background
MaxText is currently engineered around pyconfig. Pyconfig—https://pypi.org/project/pyconfig/—was last updated in 2017 and has 20k monthly downloads.
Pydantic—https://pypi.org/project/pydantic/—is constantly updated and has hundreds of millions of monthly downloads.
– https://docs.pydantic.dev
Summary
The TL;DR version is:
--helpand GUIs and generate SQL models (SQLAlchemy) as desiredMigration
Proposed changes to the MaxText codebase:
from_pyconfig_to_pydantictypes.py, ortypes.pyper config occurrence (e.g., one per module if each module has a different config)Tests
CI and manual:
TL;DR version, these worked:
defaultllama2-7bmistral-7bdeepseek3-tinygemma-2bgemma2-2bqwen3-0.6bqwen3-4bqwen3-4b-thinking-2507gpt3-6bgpt3-52kdefault_basic_1default_32default_64default_128default_256default_512gpt_3_175bgpt_3_175b_bf16llama2_7b_4096llama2_70b_4096llama2_70b_4096_syntheticllama2_70b_4096_scllama2_70b_4096_sc_real_data_tfdsllama2_70b_4096_sc_real_data_grainllama2_70b_4096_sc_real_data_grain_checkpointllama2_70b_4096_rd_lrllama3_8b_8192llama3_70b_8192llama3_1_405b_8192_fsdp_dcnllama3_1_405b_8192_pure_fsdp_icillama3_1_8b_8192llama3_1_8b_8192_bs5llama3_1_8b_8192_no_collective_matmulllama3_1_70b_8192llama3_1_70b_8192_bs2llama3_1_70b_8192_bs2_bfloat16_no_collective_matmulllama3_1_70b_8192_bs4llama3_1_70b_8192_syntheticllama3_1_70b_8192_rd_grainllama3_1_70b_8192_synthetic_ckptllama3_1_70b_8192_rd_ckpt_grainllama3_1_70b_8192_pw_lr_rdllama3_1_70b_8192_iter_real_data_and_checkpointing_tfdsllama3_1_70b_8192_synthllama3_1_70b_129024mistral_7bmixtral_8x7b_droplessmixtral_8x7b_droppedmixtral_8x7b_dropped_int8mixtral_8x22b_droppeddeepseek_v3_ep16gemma2_9b_8192gemma2_27b_8192gemma3_12b_32768_v6e256gemma3_12b_32768_2x_v6e256gemma3_12b_32768_4x_v6e256llama3_1_70b_131072custom_moe_700bllama3_1_405b_8192_v5p_1024deepseek_v3_ep_256_v5p_512llama4_scout_dropless_v5p_256llama4_maverick_dropless_v5p_256llama2_70b_v5p_128llama2_7b_v5p_128gpt_3_175b_v5p_128gpt_3_175b_v5p_128_scdeepseek3_671b_v5p_1024default_16b_v5e_256default_32b_v5e_256default_64b_v5e_256default_128b_v5e_256gpt_3_175b_v5e_256llama2_7b_v5e_256llama2_13b_v5e_256llama2_70b_v5e_256llama3_1_8b_8192_v5e_256deepseek_v3_ep_256_v5p_512_c4mlperfManually ran this to test also:
python3 -m MaxText.decode MaxText/configs/base.yml \ model_name=llama2-7b \ tokenizer_path=assets/tokenizer_llama3.tiktoken \ tokenizer_type=tiktoken \ scan_layers=false \ per_device_batch_size=1 \ ici_fsdp_parallelism=1 \ ici_autoregressive_parallelism=-1 \ max_prefill_predict_length=128 \ max_target_length=256 \ prompt="I love to" \ attention=dot_productChecklist
Before submitting this PR, please make sure (put X in square brackets):