improve docs

Quentin-Anthony · Quentin-Anthony · commit f794edfff7dd · 2024-09-29T01:02:48.000-07:00
diff --git a/post-training/online_training.md b/post-training/online_training.md
@@ -1,12 +1,12 @@
 # Online Training
 
 ## Prerequisites
-Want to use REINFORCE to train your model? First you'll need to build a custom vllm package.
+Want to use [REINFORCE](https://arxiv.org/abs/2402.14740) to train your model? First you'll need to build a custom vllm package.
 
-[synth-vllm](https://github.com/SynthLabsAI/synth-vllm) is a fork of [vllm](https://github.com/vllm-project/vllm)
-that has been modified to support using the weights in NeoX by sharing the GPU memory location of the model weights.
+[synth-vllm](https://github.com/SynthLabsAI/synth-vllm) is a fork of [vllm](https://github.com/vllm-project/vllm) maintained by [SynthLabs](https://www.synthlabs.ai/)
+that has been modified to support using the weights in GPT-NeoX by sharing the GPU memory location of the model weights.
 
-It currently supports llama models and pythia models.
+It currently supports Llama and Pythia models.
 
 ### Building the package
 
@@ -34,15 +34,15 @@ pip install -e .
 
 ## Training
 
-If you haven't already, run this command to generate the weights:
+If you haven't already, run this command to generate a copy of the Llama-3 weights in GPT-NeoX format:
 ```bash
 python tools/ckpts/convert_hf_llama_to_neox.py --tp 4 --model meta-llama/Meta-Llama-3-8B-Instruct --model_path checkpoints/neox_converted/llama3-8b-instruct
 ```
 
 [online_example.sh](online_example.sh), [online_data_example_llama3.py](online_data_example_llama3.py) is an example of
 how to train a model using the synth-vllm package on a single node.
 
-This assumes you are using a conda environment with NeoX installed under the name `neox`.
+This assumes you are using a conda environment with GPT-NeoX installed under the name `neox`.
 
 To run the example, execute the following commands: