-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test fails with Dask 2024.11.0+ #10994
Comments
Looking at the error, I can't be sure how to create an example for dask to debug it. |
Were you able to reproduce the failure in the XGBoost pytest? |
Yes, on my local machine with the latest dask/distributed, running only the classification tests.
|
Dask is getting flakier with the new dask-expr and the new shuffle engine, might take some time to debug these. |
This patch less us progress a bit further in the tests, by avoiding the issue in dask/distributed#8998 (we don't want to disable optimization, so this is just for debugging).
I believe that finishes training, and errors during
The scheduler logs reveal a couple of errors
The log message
Is probably relevant. Seems like Dask lost track of some data unexpectedly. After that, we have some issue on this The final
might be irrelevant. That could just be an issue with Dask attempting to recreate the error. I'm not sure yet. |
@TomAugspurger Are you working with Dask 2024.11.2 or 2024.12.1 ? I get different error messages depending on the Dask version I use. Dask 2024.11.2: |
I've been testing multiple versions of dask, and my error messages match yours. Right now, my suspicion is that there are multiple issues to work through
diff --git a/python-package/xgboost/dask/__init__.py b/python-package/xgboost/dask/__init__.py
index 635bedc7d..20da9fcc4 100644
--- a/python-package/xgboost/dask/__init__.py
+++ b/python-package/xgboost/dask/__init__.py
@@ -397,7 +397,7 @@ class DaskDMatrix:
"""
d = client.persist(d)
- delayed_obj = d.to_delayed()
+ delayed_obj = d.to_delayed(optimize_graph=False)
if isinstance(delayed_obj, numpy.ndarray):
# da.Array returns an array to delayed objects
check_columns(delayed_obj) |
One other relevant point: this only fails when passing in a Dask DataFrame into Things also work fine with a simpler # The key components
import dask.distributed
import xgboost.dask
import dask.array as da
import xgboost
import numpy as np
X = da.random.random(size=(100, 2))
y = da.random.choice(range(2), size=(100,))
X_d = X.to_dask_dataframe()
cluster = dask.distributed.LocalCluster()
client = dask.distributed.Client(cluster)
classifier = xgboost.dask.DaskXGBClassifier(eval_metric="merror")
classifier.fit(X, y)
prediction = classifier.predict_proba(X).compute() # OK!
print("array OK!")
def mapped_predict(partition, *, booster: xgboost.Booster) -> np.ndarray:
m = xgboost.DMatrix(partition)
predt = booster.predict(m)
return predt
# works fine
booster = classifier.get_booster()
X_d.map_partitions(mapped_predict, booster=booster).compute()[:1]
booster_f = client.scatter(booster, hash=False, broadcast=True)
meta = np.empty((0, classifier.n_classes_))
X_d.map_partitions(mapped_predict, booster=booster_f, meta=meta).compute()[:1] It's only when using
|
https://github.com/dmlc/xgboost/actions/runs/11753771153/job/32747003155
The text was updated successfully, but these errors were encountered: