Skip to content

Commit 6d09a0f

Browse files
author
swyx
committedFeb 20, 2025
vault backup: 2025-02-20 - 1 files
Affected files: Monthly Notes/Feb 2025 notes.md
1 parent 2b973e4 commit 6d09a0f

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed
 

‎Monthly Notes/Feb 2025 notes.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -21,8 +21,6 @@
2121
- Agentica: [Replicating Deepseek-R1 for $4500: RL Boosts 1.5B Model Beyond o1-preview](https://github.com/agentica-project/deepscaler)
2222

2323

24-
25-
2624
## Demos
2725

2826
- [gemini youtube agent ](https://x.com/DynamicWebPaige/status/1887897486770974770)
@@ -41,6 +39,8 @@
4139
- [vibe coding](https://news.ycombinator.com/item?id=42913909)
4240
- [codenames benchmark](https://x.com/IlyaAbyzov/status/1885784027275424227)
4341
- [roadmap prompt](https://x.com/kregenrek/status/1885979673059876883)
42+
- [hf ultrascale playbook](https://huggingface.co/spaces/nanotron/ultrascale-playbook)
43+
- [how to scale your model by deepmind](https://buttondown.com/ainews/archive/ainews-how-to-scale-your-model-by-deepmind/)
4444
- [RLHF book](https://news.ycombinator.com/item?id=42902936)
4545
- [PPO and GRPO](https://yugeten.github.io/posts/2025/01/ppogrpo/)
4646
- [GRPO](https://x.com/nrehiew_/status/1885079616248832090): "GRPO gets rid of the Value Model and NOT the Reward Model. This is the main insight since you save memory. The main change between PPO and GRPO is the way the advantage is calculated. PPO uses the Value Model to compute the advantage while GRPO computes the advantage by normalizing against the rollouts in each group."

0 commit comments

Comments
 (0)
Please sign in to comment.