dmlc · trivialfis · Feb 22, 2025 · Feb 13, 2025 · Feb 13, 2025 · Feb 13, 2025
diff --git a/doc/faq.rst b/doc/faq.rst
@@ -75,4 +75,4 @@ This could happen, due to non-determinism in floating point summation order and
 Why do I see different results with sparse and dense data?
 **********************************************************
 
-"Sparse" elements are treated as if they were "missing" by the tree booster, and as zeros by the linear booster. However, if we convert the sparse matrix back to dense matrix, the sparse matrix might fill the missing entries with 0, which is a valid value for xgboost.
+"Sparse" elements are treated as if they were "missing" by the tree booster, and as zeros by the linear booster. However, if we convert the sparse matrix back to dense matrix, the sparse matrix might fill the missing entries with 0, which is a valid value for xgboost. In short, sparse matrix implementations like scipy treats 0 as missing, while 0 is a valid split value for XGBoost decision trees.
diff --git a/doc/parameter.rst b/doc/parameter.rst
@@ -555,7 +555,8 @@ These are parameters specific to learning to rank task. See :doc:`Learning to Ra
 ***********************
 Command Line Parameters
 ***********************
-The following parameters are only used in the console version of XGBoost
+The following parameters are only used in the console version of XGBoost. The CLI has been
+deprecated and will be removed in future releases.
 
 * ``num_round``
 

diff --git a/doc/tutorials/categorical.rst b/doc/tutorials/categorical.rst
@@ -7,6 +7,10 @@ Categorical Data
    As of XGBoost 1.6, the feature is experimental and has limited features. Only the
    Python package is fully supported.
 
+.. versionadded:: 3.0
+
+   Support for the R package using ``factor``.
+
 Starting from version 1.5, the XGBoost Python package has experimental support for
 categorical data available for public testing. For numerical data, the split condition is
 defined as :math:`value < threshold`, while for categorical data the split is defined

diff --git a/doc/tutorials/multioutput.rst b/doc/tutorials/multioutput.rst
@@ -13,8 +13,8 @@ terminologies related to different multi-output models please refer to the
 
 .. note::
 
-   As of XGBoost 2.0, the feature is experimental and has limited features. Only the
-   Python package is tested.
+   As of XGBoost 3.0, the feature is experimental and has limited features. Only the
+   Python package is tested. In addition, ``glinear`` is not supported.
 
 **********************************
 Training with One-Model-Per-Target

diff --git a/doc/tutorials/param_tuning.rst b/doc/tutorials/param_tuning.rst
@@ -31,12 +31,18 @@ There are in general two ways that you can control overfitting in XGBoost:
 
 * The first way is to directly control model complexity.
 
-  - This includes ``max_depth``, ``min_child_weight`` and ``gamma``.
+  - This includes ``max_depth``, ``min_child_weight``, ``gamma``, ``max_cat_threshold``
+    and other similar regularization parameters. See :doc:`/parameter` for a comprehensive
+    set of parameters.
+  - Set a constant ``base_score`` based on your own criteria. See
+    :doc:`/tutorials/intercept` for more info.
 
 * The second way is to add randomness to make training robust to noise.
 
-  - This includes ``subsample`` and ``colsample_bytree``.
-  - You can also reduce stepsize ``eta``. Remember to increase ``num_round`` when you do so.
+  - This includes ``subsample`` and ``colsample_bytree``, which may be used with boosting
+    RF ``num_parallel_tree``.
+  - You can also reduce stepsize ``eta``, possibly with a training callback. Remember to
+    increase ``num_round`` when you do so.
 
 
 *************************
@@ -56,6 +62,25 @@ This can affect the training of XGBoost model, and there are two ways to improve
   - Set parameter ``max_delta_step`` to a finite number (say 1) to help convergence
 
 
+*************************************************
+Use Hyper Parameter Optimization (HPO) Frameworks
+*************************************************
+Tuning models is a sophisticated task and there are advanced frameworks to help you. For
+examples, some meta estimators in scikit-learn like
+:py:class:`sklearn.model_selection.HalvingGridSearchCV` can help guide the search
+process. Optuna is another great option and there are many more based on different
+branches of statistics.
+
+**************
+Know Your Data
+**************
+It cannot be stressed enough the importance of understanding the data, sometimes that's
+all it takes to get a good model. Many solutions use a simple XGBoost tree model without
+much tuning and emphasize the data pre-processing step. XGBoost can help feature selection
+by providing both a global feature importance score and sample feature importance with
+SHAP value. Also, there are parameters specifically targeting categorical features, and
+tasks like survival and ranking. Feel free to explore them.
+
 *********************
 Reducing Memory Usage
 *********************