Describe the Issue
This is a comprehensive Meta-Issue tracking a global hardening audit of the Atropos RL framework. The audit focused on the Single Copy (Shared Memory) Mode and Teacher Distillation pipelines, which were found to have critical architectural and numerical gaps.
A total of 9 critical findings were addressed across 8 targeted fixes, ensuring the framework is stable for production-scale training on modern transformer architectures (Llama 3, Qwen, etc.).
Key Areas Addressed:
- Numerical Integrity: Fixed silent bit-corruption in shared memory and advantage normalization explosions.
- Model Compatibility: Resolved RoPE theta desync and meta-tensor initialization crashes in
ModuleLists.
- Feature Completeness: Restored the teacher distillation feedback loop (previously a no-op).
- Operational Safety: Implemented backpressure to prevent OOMs and hardened process termination logic.
Environment/API Details
- Environment Class/Name: Core Infrastructure (
example_trainer, atroposlib.api.server)
- Environment Configuration: All environments using
TeacherDistillationEnv or --openai.server_type vllm.
- API Endpoint/Method Involved:
example_trainer/model.py, example_trainer/training.py, atroposlib/api/server.py.
Steps to Reproduce
These issues manifest during high-throughput RL training, specifically when using:
- vLLM shared memory attachment (
Single Copy Mode).
- Teacher-guided distillation on reasoning tasks (GSM8K).
- High-context models requiring specific RoPE theta configurations.
Interaction Details (Individual Issue Tracking)
The audit results are documented across the following specific Issue/PR pairs:
| Area |
Tracking Issue |
Implementation PR |
| Dtype Validation |
#454 |
#462 |
| RoPE Theta & Meta Traversal |
#455 |
#463 |
| Teacher Distillation Pipeline |
#456 |
#464 |
| Advantage Normalization |
#457 |
#465 |
| CUDA IPC Handle Cleanup |
#458 |
#466 |
| Rollout Queue Backpressure |
#459 |
#467 |
| Process Termination Safety |
#460 |
#468 |
| Tokenizer Config Portability |
#461 |
#469 |
Setup Details
- OS: Linux
- Python Version: 3.10+
- Atropos Version: Latest / Audit Commit
c20c852
- Relevant Libraries/Versions:
torch>=2.1.0, vllm>=0.3.0, transformers>=4.38.0
Additional Context & Logs
Full audit report and verification walkthrough can be found in the attached PRs. Each PR contains isolated unit tests demonstrating the fix correctness.
cc @dmahan93
Describe the Issue
This is a comprehensive Meta-Issue tracking a global hardening audit of the Atropos RL framework. The audit focused on the Single Copy (Shared Memory) Mode and Teacher Distillation pipelines, which were found to have critical architectural and numerical gaps.
A total of 9 critical findings were addressed across 8 targeted fixes, ensuring the framework is stable for production-scale training on modern transformer architectures (Llama 3, Qwen, etc.).
Key Areas Addressed:
ModuleLists.Environment/API Details
example_trainer,atroposlib.api.server)TeacherDistillationEnvor--openai.server_type vllm.example_trainer/model.py,example_trainer/training.py,atroposlib/api/server.py.Steps to Reproduce
These issues manifest during high-throughput RL training, specifically when using:
Single Copy Mode).Interaction Details (Individual Issue Tracking)
The audit results are documented across the following specific Issue/PR pairs:
Setup Details
c20c852torch>=2.1.0,vllm>=0.3.0,transformers>=4.38.0Additional Context & Logs
Full audit report and verification walkthrough can be found in the attached PRs. Each PR contains isolated unit tests demonstrating the fix correctness.
cc @dmahan93