From 3b0de85fa88a1782d53ca27e471d47fc68b2d6a8 Mon Sep 17 00:00:00 2001
From: Andie Jones <andrew.joe.jones@gmail.com>
Date: Tue, 2 Sep 2025 17:06:47 -0700
Subject: [PATCH 1/2] Update training details

---
 docs/tutorials/open-deep-research.mdx | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/tutorials/open-deep-research.mdx b/docs/tutorials/open-deep-research.mdx
index 94acfde1..cdfd53be 100644
--- a/docs/tutorials/open-deep-research.mdx
+++ b/docs/tutorials/open-deep-research.mdx
@@ -83,7 +83,7 @@ The following steps execute when a training run on a new cluster begins:
 - **Download the model checkpoint.**
   - Usually takes a few minutes depending on the model size.
 - **Train the model for a specified number of steps.**
-  - Each RL step involves running the research agent on benchmark questions, evaluating the results, and updating the model based on the rewards. Training time depends on the number of steps and the complexity of each research task.
+  - Each RL step involves running the research agent on a subset of benchmark questions, and updating the model based on the rewards. We hold out another subset of test set questions to evalutate model progress every 10 steps that we do not train on. Training time depends on the number of steps and the complexity of each research task.
 - **Upload the final model checkpoint.**
   - This usually takes a few minutes.
 

From c11955ce793d8ff8b79e9922858f4c7f988d6172 Mon Sep 17 00:00:00 2001
From: Andie Jones <andrew.joe.jones@gmail.com>
Date: Tue, 2 Sep 2025 17:16:49 -0700
Subject: [PATCH 2/2] Further clarification

---
 docs/tutorials/open-deep-research.mdx | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/tutorials/open-deep-research.mdx b/docs/tutorials/open-deep-research.mdx
index cdfd53be..39e221d0 100644
--- a/docs/tutorials/open-deep-research.mdx
+++ b/docs/tutorials/open-deep-research.mdx
@@ -83,7 +83,7 @@ The following steps execute when a training run on a new cluster begins:
 - **Download the model checkpoint.**
   - Usually takes a few minutes depending on the model size.
 - **Train the model for a specified number of steps.**
-  - Each RL step involves running the research agent on a subset of benchmark questions, and updating the model based on the rewards. We hold out another subset of test set questions to evalutate model progress every 10 steps that we do not train on. Training time depends on the number of steps and the complexity of each research task.
+  - Each RL step involves running the research agent on a subset of benchmark questions, and updating the model based on the rewards. We hold out another randomly-selected subset of 10 questions (10% of the total benchmark) that are never used in training that we run evaluations on every 10 steps to make sure the model is still making progress. Training time depends on the number of steps and the complexity of each research task.
 - **Upload the final model checkpoint.**
   - This usually takes a few minutes.