pjlab-sys4nlp / llama-moe Public

Notifications
Fork 55
Star 942

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: pjlab-sys4nlp/llama-moe

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clear current search query, filters, and sorts

5 Open 19 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

Can this model continue pre-training on a single A100？这个模型能在一张A100上继续训练吗

#73 by He-JYang was closed Jan 25, 2025

Code on Expert Specialization Experiments

#71 by Tangkexian was closed Oct 24, 2024

Can this be used as a means to speed up LLM inferencing?

#70 by bulaikexiansheng was closed Oct 24, 2024

Any experiments about the load balancing loss?

#69 by exhyy was closed Oct 9, 2024

per_device_train_batch_size=1，but almost all of my GPU memory is still being used up?

#67 by rzr002 was closed Mar 11, 2024

Some weights of LlamaMoEForCausalLM were not initialized

#66 by Minami-su was closed Mar 11, 2024

please update modeling_llama_moe_hf.py

#65 by Minami-su was closed Mar 8, 2024

If I can't configure Slurm on a cluster, does that mean I can't use multi-node multi-GPU setups?

#64 by rzr002 was closed Mar 11, 2024

Partition FFNs without downsizing them?

#62 by abhinand5 was closed Mar 11, 2024

我们才能从llama13b开始训练moe呢？

#60 by xyjsjruiliu was closed Feb 20, 2024

About dataset prepare

#58 by bestfleer was closed Mar 11, 2024

How to split "down" by "up" when using clustering to construct experts? 请问使用clustering进行Expert Construction时，down怎么根据up划分？

#56 by Attention-is-All-I-Need was closed Mar 11, 2024

./scripts/expert_construction/split/run_split_random.sh: 行 18: srun: 未找到命令

#54 by 18600709862 was closed Mar 11, 2024

about cosine lr scheduler

#53 by ftgreat was closed Jan 5, 2024

Questions about capacity_factor, score_scale_factor

#52 by theblackcat102 was closed Jan 5, 2024

Performance comparison between LLama-MOE and the original dense model.

#49 by DoubleVII was closed Dec 28, 2023

About Chinese performances. 关于中文能力的询问

#48 by WangRongsheng was closed Dec 28, 2023

Why a new trainer instead of the original one? 请教一下为什么要新写一个llama_lr_scheduling_trainer，它的作用是什么，为什么不用原始trainer

#47 by linyubupa was closed Jan 3, 2024

LLama直接Moe化后效果怎么样？

#9 by YixinSong-e was closed Aug 7, 2023

ProTip! Find all open issues with in progress development work with linked:pr.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly