Add REINFORCE policy gradient example for CartPole using Flax NNX #5036

sanepunk · 2025-10-19T08:34:21Z

Description

This PR adds a complete implementation of the REINFORCE policy gradient algorithm using Flax NNX for the CartPole-v1 environment.

Motivation

Demonstrates how to implement reinforcement learning algorithms with Flax NNX
Provides a canonical example of policy gradient methods
Serves as an entry point for users interested in RL with Flax
Complements existing supervised learning examples in the repository

Implementation Details

Algorithm: REINFORCE (Williams, 1992) - Monte Carlo policy gradient method

Architecture:

3-layer MLP policy network (4→128→128→2)
Leaky ReLU activations
Xavier normal weight initialization
Categorical action distribution

Training:

Adam optimizer with exponential learning rate decay
Gradient clipping (global norm = 1.0)
Return normalization for stable training
Discount factor γ = 0.99

Environment: CartPole-v1 via Gymnax

What's Included

examples/reinforce/simple_reinforce.ipynb - Complete Jupyter notebook with:
- Environment setup
- Policy network definition using NNX modules
- Training loop with progress tracking
- Visualization of training curves and agent behavior
examples/reinforce/README.md - Comprehensive documentation
examples/reinforce/requirements.txt - All dependencies
examples/reinforce/training_rewards.png - Training curve visualization
examples/reinforce/anim.gif - Trained agent animation

Performance

Training time: ~12 minutes on CPU
Episodes to solve: ~300-400 episodes
Final average reward: 490+ (solving threshold: 475)
Throughput: 1.38 episodes/second

…with complete training pipeline and visualization

review-notebook-app · 2025-10-19T08:34:27Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

vfdev-5 · 2025-10-20T14:35:08Z

@sanepunk thanks for the PR. I'll ask the team whether we would like to have a reinforcement learning example in flax repository (the maintenance of this code etc) and let you know here about the decision.

sanepunk · 2025-10-20T15:19:55Z

Thanks, @vfdev-5! I noticed there wasn’t an NNX-based RL example in the repo, so I thought it might be helpful to contribute one. Looking forward to hearing the team’s decision.

vfdev-5 · 2025-10-20T17:16:17Z

@sanepunk unfortunately, for now it was decided not to add new RL example to flax examples. This is mainly due to multiple reasons like:

we should refresh and update existing examples such they use the latest Flax API (e.g. >=0.12.0)
your code maintenance costs in future: keep versions up to date and fixing all BC-breaking changes. For example, previously, the change from gym to gymnasium and BC-breaking changes were a bit painful to dig into.
discussable dependency: gymnax which has its last commit 5 months ago and whether we can rely on it.

Again thanks a lot for this RL example with Flax! I understand that this decision can be disappointing as you already done a decent amount of work, but please consider a good practice is to open first an issue in the repository about what people would like to add as contribution and get a feedback from maintainers.

sanepunk · 2025-10-20T18:48:24Z

@vfdev-5 Thank you so much for taking the time to review and explain this! I really appreciate the feedback and will definitely open an issue first next time.

Would it be okay if I open an issue to discuss adding NNX implementations to the existing nn module examples, since it could help people understand and learn the NNX module better?

vfdev-5 · 2025-10-20T20:07:58Z

Would it be okay if I open an issue to discuss adding NNX implementations to the existing nn module examples, since it could help people understand and learn the NNX module better?

Please open first a new issue to discuss first what would you like to do. I would say the priority is to make refresh the documentation, guides etc.

sanepunk added 2 commits October 19, 2025 08:22

Add REINFORCE policy gradient example for CartPole-v1 using Flax NNX …

5d5e57f

…with complete training pipeline and visualization

Implement code changes to enhance functionality and improve performance

aa03a77

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add REINFORCE policy gradient example for CartPole using Flax NNX #5036

Add REINFORCE policy gradient example for CartPole using Flax NNX #5036

sanepunk commented Oct 19, 2025 •

edited

Loading

Uh oh!

review-notebook-app bot commented Oct 19, 2025

Uh oh!

vfdev-5 commented Oct 20, 2025

Uh oh!

sanepunk commented Oct 20, 2025

Uh oh!

vfdev-5 commented Oct 20, 2025

Uh oh!

sanepunk commented Oct 20, 2025

Uh oh!

vfdev-5 commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add REINFORCE policy gradient example for CartPole using Flax NNX #5036

Are you sure you want to change the base?

Add REINFORCE policy gradient example for CartPole using Flax NNX #5036

Conversation

sanepunk commented Oct 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation

Implementation Details

What's Included

Performance

Uh oh!

review-notebook-app bot commented Oct 19, 2025

Uh oh!

vfdev-5 commented Oct 20, 2025

Uh oh!

sanepunk commented Oct 20, 2025

Uh oh!

vfdev-5 commented Oct 20, 2025

Uh oh!

sanepunk commented Oct 20, 2025

Uh oh!

vfdev-5 commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sanepunk commented Oct 19, 2025 •

edited

Loading