Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Congratulating DeepSeek-R1 and Inviting Review of Our Team’s Early Research last year on Similar Ideas #37

Open
czheng17 opened this issue Jan 21, 2025 · 2 comments

Comments

@czheng17
Copy link

Greetings! We would like to extend my sincere gratitude for your enduring contributions to the open sourcing of LLMs. Your dedication has allowed everyone to further enjoy the benefits that LLMs bring in terms of personal improvement and efficiency enhancement. We were delighted to learn about your latest achievement: the DeepSeek-R1-Zero. This model, trained via reinforcement learning (RL) without the preliminary step of supervised fine-tuning (SFT), has shown remarkable performance in reasoning tasks.

Interestingly, as early as March 2024, my team at Bytedance Seed noticed a similar phenomenon during our early research into RLHF open-source models. Utilizing the Mistral open-source model, We developed Mistral-Plus, which verified our innovative approach of directly applying RL to the base model and completely bypassing SFT. This method not only preserves the base model’s general capabilities but significantly enhances its conversational abilities. Here is our paper published last year (March 2024), "Balancing Enhancement, Harmlessness, and General Capabilities: Enhancing Conversational LLMs with Direct RLHF," (https://arxiv.org/abs/2403.02513) and the model was open-sourced last year on Hugging Face: Mistral-Plus-7B. It garnered attention upon release. (https://huggingface.co/zhengchenphd/Mistral-Plus-7B)

During our research last year, we also discovered further Algorithm optimizations and solutions for directly applying RL to the base model without relying on SFT. One such innovation is dynamically and adaptively extending the output length limit during the RL phase, enabling the generation of more detailed and analytical content. However, this introduced the issue of generating excessive redundant information—a challenge that aligns with your findings in the later stages of the DeepSeek-R1 project. To address this, we published another paper in June last year, "Mistral-C2F: Coarse to Fine Actor for Analytical and Reasoning Enhancement in RLHF and Effective-Merged LLMs." (https://arxiv.org/abs/2406.08657)

Our Mistral-C2F Coarse-to-fine LLM introduces the "Coarse Actor" Analytical and Reasoning LLM, which incorporates the "Continuous Maximization" training strategy to dynamically extend output length limits. However, since the Coarse Actor can often generate excessive redundant information without adequate termination, we introduced the "Fine Actor" Knowledge Refining LLM as a second step. After the Coarse Actor's output is generated, it is merged with the existing Instruction model through a new strategy called 'Knowledge Residue Merger.' This allows for an optimal integration of detailed analytical reasoning into the existing SFT model. Our findings were published and open-sourced on huggingface in June 2024 (https://huggingface.co/zhengchenphd/Mistral-C2F-7B), receiving notable attention and even featured as a Hugging Face daily paper by AK (akhaliq).

We are passionate about the idea of similar innovations being implemented across various LLMs and application scenarios, contributing to the open-source community. We earnestly hope for more exchanges between our teams to further enhance the development of LLMs. Let’s embrace the AI revolution together!

@rrha
Copy link

rrha commented Jan 22, 2025

tryna get the credit or what?

@Tangylin
Copy link

然后呢,想说明有你们的一份贡献?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants