-
Notifications
You must be signed in to change notification settings - Fork 55
Issues: pjlab-sys4nlp/llama-moe
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Can this model continue pre-training on a single A100?这个模型能在一张A100上继续训练吗
#73
by He-JYang
was closed Jan 25, 2025
Can this be used as a means to speed up LLM inferencing?
#70
by bulaikexiansheng
was closed Oct 24, 2024
per_device_train_batch_size=1,but almost all of my GPU memory is still being used up?
#67
by rzr002
was closed Mar 11, 2024
If I can't configure Slurm on a cluster, does that mean I can't use multi-node multi-GPU setups?
#64
by rzr002
was closed Mar 11, 2024
./scripts/expert_construction/split/run_split_random.sh: 行 18: srun: 未找到命令
#54
by 18600709862
was closed Mar 11, 2024
Performance comparison between LLama-MOE and the original dense model.
#49
by DoubleVII
was closed Dec 28, 2023
Why a new trainer instead of the original one? 请教一下为什么要新写一个llama_lr_scheduling_trainer,它的作用是什么,为什么不用原始trainer
#47
by linyubupa
was closed Jan 3, 2024
ProTip!
Find all open issues with in progress development work with linked:pr.